unaccusativity syntax or picture difficulty?

psycholinguistics
machine learning
experimental methods
CLIP
unaccusativity
i am obssessed with early advance planning, but let me make sure about some picture saliency
Author

Utku Turk

Published

January 16, 2026

Prelude

What?

During my PhD, I’ve become fairly obsessed with production studies. I find them extremely interesting, especially the way they combine what we know from theoretical linguistics with creative experimental methods. Not to mention that the theoretical framing of production work is quite poor and the main theories used need a major overhaul. One of the most interesting papers in this area is Shota Momma’s work on advanced verb planning. Similar work was also done in German and Basque by Sebastian Sauppe’s group.

Let me set the scene. He and his colleagues ran multiple picture description experiments where participants saw images like: - “The octopus below the spoon is swimming” (unergative) - “The octopus below the spoon is boiling” (unaccusative)

If you’re not a syntax nerd, here’s the ultra-compressed version: verbs like “swim” and “bark” (unergatives) are different from verbs like “sink” and “melt” (unaccusatives), even though they both describe single-argument events. The difference has to do with argument structure—where the subject comes from in the underlying syntax. It has been argued that the subjects of unaccusatives are actually ‘deep objects’ for lack of a better term, and they structurally start in the same position as any other object.

They showed that these two verb types behave differently in production experiments. Speakers plan them differently. They tested this by showing related or unrelated words superimposed on the pictures. They found that when the verbs were related, participants slowed down before they started speaking—but only with unaccusatives. His theoretical claim was that unaccusative verbs are planned earlier in the sentence production process—possibly right at the beginning, along with the subject.

Why though?

Here’s the thing. Another thing that made me very excited about the production endeavor is that there are probably so many possible confounds that require checking. And I love this song and dance in psycholinguistics, where I can stress-test findings and see how stable they are. It’s especially important when you find an unexpected result—like participants taking longer to start speaking when they won’t say the verb for at least 3 more seconds—my first instinct, and I hope yours, is to wonder: “Is this real, or is something else going on?”

This post is built on a very specific worry: What if unaccusative scenes themselves, and not the syntax of them, created the results? One interesting finding in Shota Momma’s papers was that unergative planning was seemingly invisible. He has shown that there are reasons to believe that it happens while saying the second NP. But quantitatively, the signature of unergative planning seems to be more dissolved throughout the sentence, while the unaccusative planning is strikingly clear.

This creates the following question: is it possible that participants, simply because the picture was more difficult to understand or the subject was more involved in the action, spent more time initially to either understand the event or to extract the subject from the event, and during this time a deterministic analysis of the written word kicked in and slowed them down when it was related? Since the unergative subjects are more easily dissociable from the event, since nothing is happening to them in those pictures, it takes less time, and since it’s less of a resource-heavy process, no additional process interferes with it. This has several predictions. First, in follow-up experiments where the unergative pictures are hard to ‘retrieve’ from the scene, one should see similar onset effects. Second, if there is some sort of picture-difficulty metric, the advance planning should align with that metric item-wise.

The second prediction is going to be the basis of this blog post, where we will find a way to quantify the picture difficulty.

I make assumptions

I assume the following ‘two-way’ distinction with respect to lexical verbs. However, one needs to admit that unaccusativity is not stable all the time. Many such unaccusative verbs can be used as unergatives given some adverbial modification or different contexts. This would create some minor infelicity in English, but that is not the case for many languages. For example, Laz can make any verb ‘agentive’ with a small prefix. Imagine a Laz-type English where you have “I cried” vs. “I do-cried,” where the second one means that you made yourself cry or you deliberately cried. Or a better example might be: imagine if English “jump” were decomposable into a prefix “do-” and “fall.” So, for now I only assume that these properties are lexical properties of the verb, but one needs to admit that these are event-related ones.

  • Unergative actions (swimming, barking, running): The action is performed by the agent. You can see the octopus swimming—the action is somewhat separable from what happens to the entity.
  • Unaccusative actions (boiling, melting, sinking): Something is happening to the entity. The octopus isn’t “doing” boiling—it’s undergoing a change of state. The action and the entity are less separable.

Another assumption I make is about CLIP/VLM. The input that CLIP takes is a written sentence and a picture. I am fully aware that the way CLIP assesses pictures is nowhere near how humans do.1 I am also aware that in human speech, the scenes are what is encoded and the speech is the decoding. CLIP works differently. CLIP is a two-encoder model. Given two inputs of a picture and a text, it creates two separate vectors and checks how similar those vectors are. Thus, it does not give us anything about human cognition. But it gives us a way to quantify relevant metrics. Below what I assume to be the models of human speech production based on Levelt’s work and CLIP’s architecture.

Levelt’s Speech Production Model:

flowchart TD
    A[Conceptualizer] --> |Preverbal Message| B[Formulator]
    B --> |Grammatical Encoding| C[Mental Lexicon<br/>Lemmas]
    C --> B
    B --> |Phonological Encoding| D[Mental Lexicon<br/>Forms]
    D --> B
    B --> |Phonetic Plan| E[Articulator]
    E --> |Overt Speech| F[Overt Speech]
    F --> |Auditory Feedback| G[Speech Comprehension<br/>System]
    G -.-> A

    style A fill:#e1f5dd
    style B fill:#d4e9f7
    style C fill:#fff3cd
    style D fill:#fff3cd
    style E fill:#ffd4e5
    style F fill:#f8d7da
    style G fill:#e8e8e8

CLIP Architecture:

flowchart TD
    A[Picture] --> B[Image Encoder]
    C[Text] --> D[Text Encoder]
    B --> E[Image Embedding]
    D --> F[Text Embedding]
    E --> G[Similarity Score]
    F --> G

    style A fill:#e1f5dd
    style C fill:#e1f5dd
    style B fill:#d4e9f7
    style D fill:#d4e9f7
    style E fill:#fff3cd
    style F fill:#fff3cd
    style G fill:#f8d7da

Multimodal LLMs:

More recently, multimodal large language models have emerged that work quite differently from CLIP. Instead of creating separate embeddings and comparing them, these models integrate visual and textual information into a unified representation and can generate natural language descriptions or answers about images.

I have to say, writing their code is also a bit funny. You basically have to build a pipeline where you create a ‘chat template’ and ask them to give you an output. I am not sure that is how you are supposed to use them, but it works.2

Models like Qwen3-Omni take both images and text as input, process them through vision encoders and language models together, and generate coherent text outputs. Unlike CLIP’s similarity metric, multimodal LLMs can provide richer, more nuanced interpretations of visual scenes and answer complex questions about them. We will use both of them and compare here.

flowchart TD
    A[Picture] --> B[Vision Encoder]
    C[Text Prompt] --> D[Tokenizer]
    B --> E[Visual Tokens]
    D --> F[Text Tokens]
    E --> G[Unified LLM]
    F --> G
    G --> H[Generated Text Output]

    style A fill:#e1f5dd
    style C fill:#e1f5dd
    style B fill:#d4e9f7
    style D fill:#d4e9f7
    style E fill:#fff3cd
    style F fill:#fff3cd
    style G fill:#ffd4e5
    style H fill:#f8d7da

Lastly, these experiments were conducted as a extended-PWI experiment, where participants were provided with a picture with a superimposed text on it. Neither the pictures, nor the tasks I improvise here does not have any relation to picture word interference task. It would be indeed interesting if we have an understanding how PWI would look like interms of LLM tasks. However it is far from what I would like to achieve here. If I have that idea I will probably submit a paper or an abstract somewhere :).

Predictions

If unaccusative actions (like “boiling” or “melting”) are genuinely harder to see in pictures, or if the subjects are harder to visually identify in the scenes, we’d expect: - Lower similarity scores between the images and their target sentences - Evidence that models struggle to “ground” the sentence/entity in the visual input, in the form of subject saliency.

If that’s the case, we have a problem—the onset latency effect might just be about picture difficulty.3

But if the similarity scores are comparable or higher for unaccusatives, then we can rule out the perceptual confound for now and be more confident that the effects reflect genuine linguistic processing.

Model Base

CLIP

CLIP (Contrastive Language-Image Pre-training) is a neural network trained on 400 million image-text pairs from the internet. It learns to match images with their corresponding text descriptions by projecting both into a shared embedding space.

Setting Up

Let’s start by loading the packages we’ll need. I’m going to build this up step by step, just like I did when I first ran this analysis.

import os
import torch
import clip
from PIL import Image
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer

# Set up plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

First, we need to load the CLIP model. I’m using the ViT-B/32 variant, which is a good balance between performance and computational efficiency:

# Load two decoder CLIP model
# Note: We use CPU for everything if MPS is detected to avoid moondream2 issues
if torch.cuda.is_available():
    device = "cuda"
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
    device = "cpu" 
else:
    device = "cpu"

model_clip, preprocess = clip.load("ViT-B/32", device=device, jit=False)

print(f"Using device: {device}")
print(f"CLIP model loaded successfully!")

Now let’s also load a multimodal LLM for comparison. We’ll use Qwen-VL-Chat, a powerful vision-language model:

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import transformers
import torch
from transformers.generation.beam_search import BeamSearchScorer
transformers.BeamSearchScorer = BeamSearchScorer

# Load Qwen-VL-Chat model
model_id = "Qwen/Qwen-VL-Chat"

model_vlm = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=torch.float32
).to('cpu')
tokenizer_vlm = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

# Create the streamer
streamer = TextStreamer(tokenizer_vlm, skip_prompt=True)

The Data Structure

My experimental materials consist of 24 scenes: - 12 unergative scenes (swimming, running, barking, etc.) - 12 unaccusative scenes (boiling, shrinking, sinking, etc.)

Each scene pairs a character (octopus, ballerina, chef, etc.) with an action. Let’s create a dataframe with our materials:

# Unergative scenes
df_unerg = pd.DataFrame({
    "Filename": [
        "./pictures/octopus_swim.jpg",
        "./pictures/ballerina_run.jpg",
        "./pictures/boy_float.jpg",
        "./pictures/chef_yell.jpg",
        "./pictures/clown_walk.jpg",
        "./pictures/cowboy_wink.jpg",
        "./pictures/dog_bark.jpg",
        "./pictures/monkey_sleep.jpg",
        "./pictures/penguin_sneeze.jpg",
        "./pictures/pirate_cough.jpg",
        "./pictures/rabbit_smile.jpg",
        "./pictures/snail_crawl.jpg",
    ],
    "Sentence": [
        "The octopus is swimming.",
        "The ballerina is running.",
        "The boy is floating.",
        "The chef is yelling.",
        "The clown is walking.",
        "The cowboy is winking.",
        "The dog is barking.",
        "The monkey is sleeping.",
        "The penguin is sneezing.",
        "The pirate is coughing.",
        "The rabbit is smiling.",
        "The snail is crawling.",
    ]
})

# Unaccusative scenes
df_unacc = pd.DataFrame({
    "Filename": [
        "./pictures/octopus_boil.jpg",
        "./pictures/ballerina_shrink.jpg",
        "./pictures/boy_yawn.jpg",
        "./pictures/chef_drown.jpg",
        "./pictures/clown_grow.jpg",
        "./pictures/cowboy_fall.jpg",
        "./pictures/dog_spin.jpg",
        "./pictures/monkey_trip.jpg",
        "./pictures/penguin_bounce.jpg",
        "./pictures/pirate_sink.jpg",
        "./pictures/rabbit_shake.jpg",
        "./pictures/snail_melt.jpg",
    ],
    "Sentence": [
        "The octopus is boiling.",
        "The ballerina is shrinking.",
        "The boy is yawning.",
        "The chef is drowning.",
        "The clown is growing.",
        "The cowboy is falling.",
        "The dog is spinning.",
        "The monkey is tripping.",
        "The penguin is bouncing.",
        "The pirate is sinking.",
        "The rabbit is shaking.",
        "The snail is melting.",
    ]
})

Computing Similarity Scores

Now for the main event. For each image-sentence pair, we’ll compute CLIP’s similarity score. This tells us how well the model thinks the image matches the text.

def compute_clip_similarity(df, model, preprocess, device):
    """
    Compute CLIP similarity scores for image-text pairs.

    Parameters:
    -----------
    df : pandas.DataFrame
        DataFrame with 'Filename' and 'Sentence' columns
    model : CLIP model
        Loaded CLIP model
    preprocess : function
        CLIP preprocessing function
    device : str
        'cuda' or 'cpu'

    Returns:
    --------
    pandas.DataFrame
        Original dataframe with added 'CLIP_Similarity' column
    """
    similarity_scores = []

    for _, row in df.iterrows():
        img_path = row['Filename']
        text = row['Sentence']

        # Preprocess image and tokenize text
        img = preprocess(Image.open(img_path)).unsqueeze(0).to(device)
        text_tokenized = clip.tokenize([text]).to(device)

        # Compute similarity
        with torch.no_grad():
            logits_per_image, _ = model(img, text_tokenized)
            similarity_score = logits_per_image.item()

        similarity_scores.append(similarity_score)

    # Add scores to dataframe
    df_copy = df.copy()
    df_copy['CLIP_Similarity'] = similarity_scores

    return df_copy

def compute_subject_salience(df, model, preprocess, device):
    """
    Compute CLIP similarity scores for subject noun alone.
    This measures how visually salient/easy to identify the subject is.
    
    Parameters:
    -----------
    df : pandas.DataFrame
        DataFrame with 'Filename' and 'Sentence' columns
    model : CLIP model
        Loaded CLIP model
    preprocess : function
        CLIP preprocessing function
    device : str
        'cuda' or 'cpu'
    
    Returns:
    --------
    pandas.DataFrame
        Original dataframe with added 'Subject_Salience' column
    """
    subject_scores = []
    
    for _, row in df.iterrows():
        img_path = row['Filename']
        sentence = row['Sentence']
        
        # Extract subject noun (assumes format "The X is ...")
        # Extract word after "The " and before " is"
        subject = sentence.split("The ")[1].split(" is")[0]
        
        # Preprocess image and tokenize subject
        img = preprocess(Image.open(img_path)).unsqueeze(0).to(device)
        text_tokenized = clip.tokenize([subject]).to(device)
        
        # Compute similarity
        with torch.no_grad():
            logits_per_image, _ = model(img, text_tokenized)
            similarity_score = logits_per_image.item()
        
        subject_scores.append(similarity_score)
    
    df_copy = df.copy()
    df_copy['Subject_Salience'] = subject_scores
    
    return df_copy

We can also use a multimodal LLM to verify the image-sentence match in a different way. Instead of computing similarity scores, we’ll ask the model to rate how well the sentence describes the image:

def compute_qwen_scores(df, model, tokenizer, streamer=None):
    """
    Compute verification scores using Qwen-VL-Chat multimodal LLM.

    Parameters:
    -----------
    df : pandas.DataFrame
        DataFrame with 'Filename' and 'Sentence' columns
    model : Qwen-VL-Chat model
        Loaded Qwen model
    tokenizer : AutoTokenizer
        Qwen tokenizer
    streamer : TextStreamer, optional
        Streamer for real-time output

    Returns:
    --------
    pandas.DataFrame
        Original dataframe with added 'VLM_Score' and 'VLM_Response' columns
    """
    import re
    scores = []
    responses = []

    for idx, row in df.iterrows():
        img_path = row['Filename']
        sentence = row['Sentence']

        # Create query for Qwen-VL-Chat
        query = tokenizer.from_list_format([
            {'image': img_path},
            {'text': f'Rate how well this sentence describes the image: "{sentence}"\nScore from 1-10 (1=mismatch, 10=perfect match). Reply with just the number.'},
        ])

        # Generate response
        with torch.no_grad():
            response, _ = model.chat(tokenizer, query=query, history=None, streamer=streamer)

        # Extract numeric score
        try:
            match = re.search(r'(\d+(?:\.\d+)?)', response)
            score = float(match.group(1)) if match else 5.0
            score = min(10.0, max(1.0, score))  # Clamp to 1-10
        except:
            score = 5.0

        scores.append(score)
        responses.append(response)

    df_copy = df.copy()
    df_copy['VLM_Score'] = scores
    df_copy['VLM_Response'] = responses

    return df_copy

Let’s run this on both datasets. To avoid re-computing the slow VLM scores on every render, we cache results to a CSV file:

import os

CACHE_FILE = "./cached_scores.csv"


if os.path.exists(CACHE_FILE):
    df_all = pd.read_csv(CACHE_FILE)
else:
    # Compute CLIP similarities
    df_unerg_clip = compute_clip_similarity(df_unerg, model_clip, preprocess, device)
    df_unacc_clip = compute_clip_similarity(df_unacc, model_clip, preprocess, device)
    
    # Compute subject salience scores
    df_unerg_subj = compute_subject_salience(df_unerg, model_clip, preprocess, device)
    df_unacc_subj = compute_subject_salience(df_unacc, model_clip, preprocess, device)

    # Compute Qwen-VL scores
    df_unerg_vlm = compute_qwen_scores(df_unerg, model_vlm, tokenizer_vlm, streamer=streamer)
    df_unacc_vlm = compute_qwen_scores(df_unacc, model_vlm, tokenizer_vlm, streamer=streamer)

    # Combine CLIP scores with VLM scores and subject salience
    df_unerg_scored = df_unerg_clip.copy()
    df_unerg_scored['Subject_Salience'] = df_unerg_subj['Subject_Salience']
    df_unerg_scored['VLM_Score'] = df_unerg_vlm['VLM_Score']
    df_unerg_scored['VLM_Response'] = df_unerg_vlm['VLM_Response']
    df_unerg_scored['VerbType'] = 'Unergative'

    df_unacc_scored = df_unacc_clip.copy()
    df_unacc_scored['Subject_Salience'] = df_unacc_subj['Subject_Salience']
    df_unacc_scored['VLM_Score'] = df_unacc_vlm['VLM_Score']
    df_unacc_scored['VLM_Response'] = df_unacc_vlm['VLM_Response']
    df_unacc_scored['VerbType'] = 'Unaccusative'

    # Combine for analysis
    df_all = pd.concat([df_unerg_scored, df_unacc_scored], ignore_index=True)

    # Save to cache
    df_all.to_csv(CACHE_FILE, index=False)

print(df_all.head())
              Filename                   Sentence  CLIP_Similarity  \
0   ./octopus_swim.jpg   The octopus is swimming.        29.137495   
1  ./ballerina_run.jpg  The ballerina is running.        27.731918   
2      ./boy_float.jpg       The boy is floating.        20.843243   
3      ./chef_yell.jpg       The chef is yelling.        27.878561   
4     ./clown_walk.jpg      The clown is walking.        27.077477   

   Subject_Salience  VLM_Score  VLM_Response    VerbType  
0         28.454519        8.0             8  Unergative  
1         25.250607        7.0             7  Unergative  
2         21.628622        1.0             1  Unergative  
3         28.490120        8.0             8  Unergative  
4         26.241133        8.0             8  Unergative  

Descriptive Results

Let’s start by looking at the descriptive statistics across all three metrics:

# Create comparison plot with all three metrics
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# CLIP full sentence results
sns.pointplot(data=df_all, x='VerbType', y='CLIP_Similarity',
              hue='VerbType', palette=['#3498db', '#e74c3c'], 
              ax=axes[0], errorbar='ci', capsize=0.1, 
              linestyle='none', markers='o', legend=False)
sns.stripplot(data=df_all, x='VerbType', y='CLIP_Similarity',
              color='black', alpha=0.5, size=8, ax=axes[0], jitter=0.2)

axes[0].set_xlabel('Verb Type', fontsize=14, fontweight='bold')
axes[0].set_ylabel('CLIP Similarity Score', fontsize=14, fontweight='bold')
axes[0].set_title('Full Sentence Similarity',
                  fontsize=16, fontweight='bold', pad=20)

for verb_type in ['Unergative', 'Unaccusative']:
    mean_val = df_all[df_all['VerbType'] == verb_type]['CLIP_Similarity'].mean()
    axes[0].text(0 if verb_type == 'Unergative' else 1, mean_val + 1,
                 f'M = {mean_val:.2f}', ha='center', fontsize=12, fontweight='bold')

# Subject salience results
sns.pointplot(data=df_all, x='VerbType', y='Subject_Salience',
              hue='VerbType', palette=['#3498db', '#e74c3c'], 
              ax=axes[1], errorbar='ci', capsize=0.1, 
              linestyle='none', markers='o', legend=False)
sns.stripplot(data=df_all, x='VerbType', y='Subject_Salience',
              color='black', alpha=0.5, size=8, ax=axes[1], jitter=0.2)

axes[1].set_xlabel('Verb Type', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Subject Salience Score', fontsize=14, fontweight='bold')
axes[1].set_title('Subject Noun Identifiability',
                  fontsize=16, fontweight='bold', pad=20)

for verb_type in ['Unergative', 'Unaccusative']:
    mean_val = df_all[df_all['VerbType'] == verb_type]['Subject_Salience'].mean()
    axes[1].text(0 if verb_type == 'Unergative' else 1, mean_val + 0.5,
                 f'M = {mean_val:.2f}', ha='center', fontsize=12, fontweight='bold')

# VLM results
sns.pointplot(data=df_all, x='VerbType', y='VLM_Score',
              hue='VerbType', palette=['#3498db', '#e74c3c'], 
              ax=axes[2], errorbar='ci', capsize=0.1, 
              linestyle='none', markers='o', legend=False)
sns.stripplot(data=df_all, x='VerbType', y='VLM_Score',
              color='black', alpha=0.5, size=8, ax=axes[2], jitter=0.2)

axes[2].set_xlabel('Verb Type', fontsize=14, fontweight='bold')
axes[2].set_ylabel('Qwen-VL Match Score (1-10)', fontsize=14, fontweight='bold')
axes[2].set_title('Scene Verification (Qwen-VL)',
                  fontsize=16, fontweight='bold', pad=20)

for verb_type in ['Unergative', 'Unaccusative']:
    mean_val = df_all[df_all['VerbType'] == verb_type]['VLM_Score'].mean()
    axes[2].text(0 if verb_type == 'Unergative' else 1, mean_val + 0.3,
                 f'M = {mean_val:.2f}', ha='center', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.savefig('./model_comparison_plot.png', dpi=300, bbox_inches='tight')
plt.show()

A Deeper Dive with Bayesian Analysis

While the plots above give us a good first look, they don’t tell the whole story. To really understand the strength of the evidence, we need to go beyond just comparing averages. This is where Bayesian analysis comes in.

Instead of just getting a single number for the difference, a Bayesian regression gives us a full range of plausible values for the effect of VerbType on our scores, along with a measure of our certainty.

For the nerds out there, I used Pyro to run three separate models: two simple linear regressions for the CLIP and Subject Salience scores, and an ordered logistic regression for the VLM scores (since they are on a 1-10 scale). In all models, the key parameter is beta, which represents the estimated difference between unaccusative and unergative verbs.

Here’s the code to set up and run the models:

import torch
import pyro
import pyro.distributions as dist
from pyro.infer import MCMC, NUTS

# Prepare data for Pyro
# We'll center the scores and code VerbType numerically
df_pyro = df_all.copy()
df_pyro['VerbType_num'] = df_pyro['VerbType'].map({'Unergative': -0.5, 'Unaccusative': 0.5})
df_pyro['CLIP_centered'] = df_pyro['CLIP_Similarity'] - df_pyro['CLIP_Similarity'].mean()
df_pyro['Subject_centered'] = df_pyro['Subject_Salience'] - df_pyro['Subject_Salience'].mean()
vlm_score_tensor = torch.tensor(df_pyro['VLM_Score'].values, dtype=torch.long)

# Convert to tensors
verb_type_tensor = torch.tensor(df_pyro['VerbType_num'].values, dtype=torch.float32)
clip_tensor = torch.tensor(df_pyro['CLIP_centered'].values, dtype=torch.float32)
subject_tensor = torch.tensor(df_pyro['Subject_centered'].values, dtype=torch.float32)

# --- Model for CLIP Similarity ---
def clip_model(verb_type, obs=None):
    intercept = pyro.sample('intercept', dist.Normal(0., 10.))
    beta = pyro.sample('beta', dist.Normal(0., 10.))
    sigma = pyro.sample('sigma', dist.HalfNormal(10.))
    mu = intercept + beta * verb_type
    with pyro.plate('data', len(verb_type)):
        pyro.sample('obs', dist.Normal(mu, sigma), obs=obs)

# --- Model for Subject Salience ---
def subject_model(verb_type, obs=None):
    intercept = pyro.sample('intercept', dist.Normal(0., 10.))
    beta = pyro.sample('beta', dist.Normal(0., 10.))
    sigma = pyro.sample('sigma', dist.HalfNormal(10.))
    mu = intercept + beta * verb_type
    with pyro.plate('data', len(verb_type)):
        pyro.sample('obs', dist.Normal(mu, sigma), obs=obs)
        
# --- Model for VLM Score (Ordered Logistic) ---
k_categories = vlm_score_tensor.max().item() + 1
k_cutpoints = k_categories - 1
def vlm_model(verb_type, obs=None):
    alpha = pyro.sample('alpha', dist.Normal(0., 10.))
    beta = pyro.sample('beta', dist.Normal(0., 10.))
    with pyro.plate("cutpoints_plate", k_cutpoints):
        raw_cutpoints = pyro.sample('raw_cutpoints', dist.Normal(torch.arange(k_cutpoints).float(), 1.))
    cutpoints = torch.sort(raw_cutpoints)[0]
    latent_propensity = alpha + beta * verb_type
    with pyro.plate('data', len(verb_type)):
        pyro.sample('obs', dist.OrderedLogistic(latent_propensity, cutpoints), obs=obs)

# Run the MCMC samplers
mcmc_clip = MCMC(NUTS(clip_model), num_samples=2000, warmup_steps=1000)
mcmc_clip.run(verb_type_tensor, clip_tensor)
clip_samples = mcmc_clip.get_samples()

mcmc_subject = MCMC(NUTS(subject_model), num_samples=2000, warmup_steps=1000)
mcmc_subject.run(verb_type_tensor, subject_tensor)
subject_samples = mcmc_subject.get_samples()

mcmc_vlm = MCMC(NUTS(vlm_model), num_samples=2000, warmup_steps=1000, num_chains=1)
mcmc_vlm.run(verb_type_tensor, vlm_score_tensor)
vlm_samples = mcmc_vlm.get_samples()
/Volumes/PRO-G40/workspace/oct14website/.venv/lib/python3.13/site-packages/pyro/ops/stats.py:527: SyntaxWarning: invalid escape sequence '\g'
  we have :math:`ES^{*}(P,Q) \ge ES^{*}(Q,Q)` with equality holding if and only if :math:`P=Q`, i.e.
Warmup:   0%|          | 0/3000 [00:00, ?it/s]Warmup:   0%|          | 1/3000 [00:00,  7.37it/s, step size=8.99e-01, acc. prob=1.000]Warmup:   0%|          | 10/3000 [00:00, 38.77it/s, step size=9.50e-02, acc. prob=0.740]Warmup:   0%|          | 15/3000 [00:00, 41.21it/s, step size=1.73e-01, acc. prob=0.772]Warmup:   1%|          | 24/3000 [00:00, 56.61it/s, step size=1.28e-01, acc. prob=0.777]Warmup:   1%|          | 32/3000 [00:00, 62.85it/s, step size=5.33e-02, acc. prob=0.771]Warmup:   1%|▏         | 43/3000 [00:00, 74.41it/s, step size=1.42e-01, acc. prob=0.786]Warmup:   2%|▏         | 51/3000 [00:00, 72.47it/s, step size=2.82e-01, acc. prob=0.793]Warmup:   2%|▏         | 62/3000 [00:00, 83.06it/s, step size=3.93e-01, acc. prob=0.797]Warmup:   2%|▏         | 73/3000 [00:01, 89.69it/s, step size=9.86e-02, acc. prob=0.788]Warmup:   3%|▎         | 85/3000 [00:01, 97.81it/s, step size=1.33e-01, acc. prob=0.791]Warmup:   3%|▎         | 98/3000 [00:01, 99.15it/s, step size=1.02e-01, acc. prob=0.790]Warmup:   4%|▍         | 122/3000 [00:01, 136.37it/s, step size=4.38e-01, acc. prob=0.784]Warmup:   5%|▍         | 149/3000 [00:01, 172.48it/s, step size=1.88e+00, acc. prob=0.787]Warmup:   6%|▌         | 171/3000 [00:01, 179.94it/s, step size=3.31e-01, acc. prob=0.781]Warmup:   7%|▋         | 198/3000 [00:01, 204.20it/s, step size=7.67e-01, acc. prob=0.784]Warmup:   8%|▊         | 225/3000 [00:01, 221.17it/s, step size=7.73e-01, acc. prob=0.785]Warmup:   9%|▊         | 257/3000 [00:01, 247.36it/s, step size=1.62e-01, acc. prob=0.779]Warmup:  10%|▉         | 285/3000 [00:02, 254.79it/s, step size=1.52e+00, acc. prob=0.783]Warmup:  10%|█         | 314/3000 [00:02, 264.16it/s, step size=3.25e-01, acc. prob=0.781]Warmup:  11%|█▏        | 344/3000 [00:02, 273.54it/s, step size=4.74e-01, acc. prob=0.783]Warmup:  13%|█▎        | 383/3000 [00:02, 305.80it/s, step size=8.34e-01, acc. prob=0.785]Warmup:  14%|█▍        | 419/3000 [00:02, 321.55it/s, step size=9.28e-01, acc. prob=0.786]Warmup:  15%|█▌        | 452/3000 [00:02, 316.13it/s, step size=1.44e+00, acc. prob=0.786]Warmup:  16%|█▌        | 484/3000 [00:02, 277.77it/s, step size=6.04e-01, acc. prob=0.786]Warmup:  17%|█▋        | 517/3000 [00:02, 289.47it/s, step size=1.06e+00, acc. prob=0.787]Warmup:  18%|█▊        | 547/3000 [00:02, 289.73it/s, step size=1.59e+00, acc. prob=0.788]Warmup:  20%|█▉        | 585/3000 [00:02, 314.73it/s, step size=4.76e-01, acc. prob=0.787]Warmup:  21%|██        | 622/3000 [00:03, 329.92it/s, step size=6.45e-01, acc. prob=0.788]Warmup:  22%|██▏       | 668/3000 [00:03, 366.08it/s, step size=9.81e-01, acc. prob=0.789]Warmup:  24%|██▎       | 710/3000 [00:03, 380.55it/s, step size=1.56e+00, acc. prob=0.790]Warmup:  25%|██▌       | 751/3000 [00:03, 389.19it/s, step size=1.25e+00, acc. prob=0.790]Warmup:  27%|██▋       | 797/3000 [00:03, 408.81it/s, step size=8.36e-01, acc. prob=0.790]Warmup:  28%|██▊       | 842/3000 [00:03, 419.43it/s, step size=8.55e-01, acc. prob=0.791]Warmup:  30%|██▉       | 888/3000 [00:03, 430.06it/s, step size=1.31e+00, acc. prob=0.792]Warmup:  31%|███▏      | 938/3000 [00:03, 449.58it/s, step size=1.14e+00, acc. prob=0.792]Warmup:  33%|███▎      | 984/3000 [00:03, 381.46it/s, step size=2.88e-01, acc. prob=0.790]Sample:  34%|███▍      | 1024/3000 [00:04, 370.43it/s, step size=7.32e-01, acc. prob=0.897]Sample:  35%|███▌      | 1063/3000 [00:04, 360.25it/s, step size=7.32e-01, acc. prob=0.923]Sample:  37%|███▋      | 1100/3000 [00:04, 347.96it/s, step size=7.32e-01, acc. prob=0.925]Sample:  38%|███▊      | 1136/3000 [00:04, 333.97it/s, step size=7.32e-01, acc. prob=0.923]Sample:  39%|███▉      | 1173/3000 [00:04, 343.20it/s, step size=7.32e-01, acc. prob=0.919]Sample:  40%|████      | 1208/3000 [00:04, 332.35it/s, step size=7.32e-01, acc. prob=0.924]Sample:  41%|████▏     | 1242/3000 [00:04, 328.62it/s, step size=7.32e-01, acc. prob=0.922]Sample:  43%|████▎     | 1279/3000 [00:04, 339.12it/s, step size=7.32e-01, acc. prob=0.914]Sample:  44%|████▍     | 1314/3000 [00:04, 336.60it/s, step size=7.32e-01, acc. prob=0.915]Sample:  45%|████▍     | 1348/3000 [00:05, 328.31it/s, step size=7.32e-01, acc. prob=0.916]Sample:  46%|████▌     | 1382/3000 [00:05, 329.93it/s, step size=7.32e-01, acc. prob=0.915]Sample:  47%|████▋     | 1416/3000 [00:05, 321.51it/s, step size=7.32e-01, acc. prob=0.917]Sample:  48%|████▊     | 1451/3000 [00:05, 328.49it/s, step size=7.32e-01, acc. prob=0.917]Sample:  49%|████▉     | 1484/3000 [00:05, 319.35it/s, step size=7.32e-01, acc. prob=0.918]Sample:  51%|█████     | 1522/3000 [00:05, 336.31it/s, step size=7.32e-01, acc. prob=0.919]Sample:  52%|█████▏    | 1556/3000 [00:05, 333.78it/s, step size=7.32e-01, acc. prob=0.918]Sample:  53%|█████▎    | 1590/3000 [00:05, 335.51it/s, step size=7.32e-01, acc. prob=0.918]Sample:  54%|█████▍    | 1624/3000 [00:05, 332.59it/s, step size=7.32e-01, acc. prob=0.919]Sample:  55%|█████▌    | 1662/3000 [00:05, 344.70it/s, step size=7.32e-01, acc. prob=0.919]Sample:  57%|█████▋    | 1697/3000 [00:06, 342.50it/s, step size=7.32e-01, acc. prob=0.918]Sample:  58%|█████▊    | 1732/3000 [00:06, 329.36it/s, step size=7.32e-01, acc. prob=0.918]Sample:  59%|█████▉    | 1766/3000 [00:06, 325.64it/s, step size=7.32e-01, acc. prob=0.919]Sample:  60%|██████    | 1802/3000 [00:06, 334.99it/s, step size=7.32e-01, acc. prob=0.918]Sample:  61%|██████▏   | 1840/3000 [00:06, 347.78it/s, step size=7.32e-01, acc. prob=0.918]Sample:  63%|██████▎   | 1876/3000 [00:06, 350.57it/s, step size=7.32e-01, acc. prob=0.918]Sample:  64%|██████▎   | 1912/3000 [00:06, 346.34it/s, step size=7.32e-01, acc. prob=0.917]Sample:  65%|██████▍   | 1947/3000 [00:06, 333.27it/s, step size=7.32e-01, acc. prob=0.918]Sample:  66%|██████▌   | 1981/3000 [00:06, 329.16it/s, step size=7.32e-01, acc. prob=0.918]Sample:  67%|██████▋   | 2017/3000 [00:07, 336.56it/s, step size=7.32e-01, acc. prob=0.919]Sample:  68%|██████▊   | 2053/3000 [00:07, 339.96it/s, step size=7.32e-01, acc. prob=0.918]Sample:  70%|██████▉   | 2088/3000 [00:07, 333.12it/s, step size=7.32e-01, acc. prob=0.918]Sample:  71%|███████   | 2123/3000 [00:07, 337.20it/s, step size=7.32e-01, acc. prob=0.918]Sample:  72%|███████▏  | 2159/3000 [00:07, 342.47it/s, step size=7.32e-01, acc. prob=0.919]Sample:  73%|███████▎  | 2194/3000 [00:07, 341.20it/s, step size=7.32e-01, acc. prob=0.919]Sample:  74%|███████▍  | 2233/3000 [00:07, 353.86it/s, step size=7.32e-01, acc. prob=0.918]Sample:  76%|███████▌  | 2269/3000 [00:07, 340.07it/s, step size=7.32e-01, acc. prob=0.919]Sample:  77%|███████▋  | 2307/3000 [00:07, 350.98it/s, step size=7.32e-01, acc. prob=0.919]Sample:  78%|███████▊  | 2343/3000 [00:07, 345.04it/s, step size=7.32e-01, acc. prob=0.918]Sample:  79%|███████▉  | 2379/3000 [00:08, 346.74it/s, step size=7.32e-01, acc. prob=0.918]Sample:  81%|████████  | 2418/3000 [00:08, 357.97it/s, step size=7.32e-01, acc. prob=0.917]Sample:  82%|████████▏ | 2454/3000 [00:08, 347.54it/s, step size=7.32e-01, acc. prob=0.918]Sample:  83%|████████▎ | 2489/3000 [00:08, 347.70it/s, step size=7.32e-01, acc. prob=0.918]Sample:  84%|████████▍ | 2524/3000 [00:08, 337.67it/s, step size=7.32e-01, acc. prob=0.918]Sample:  85%|████████▌ | 2558/3000 [00:08, 337.01it/s, step size=7.32e-01, acc. prob=0.917]Sample:  86%|████████▋ | 2592/3000 [00:08, 334.27it/s, step size=7.32e-01, acc. prob=0.917]Sample:  88%|████████▊ | 2628/3000 [00:08, 338.05it/s, step size=7.32e-01, acc. prob=0.917]Sample:  89%|████████▊ | 2662/3000 [00:08, 324.24it/s, step size=7.32e-01, acc. prob=0.917]Sample:  90%|████████▉ | 2697/3000 [00:09, 330.41it/s, step size=7.32e-01, acc. prob=0.917]Sample:  91%|█████████ | 2731/3000 [00:09, 327.26it/s, step size=7.32e-01, acc. prob=0.917]Sample:  92%|█████████▏| 2764/3000 [00:09, 318.89it/s, step size=7.32e-01, acc. prob=0.917]Sample:  93%|█████████▎| 2796/3000 [00:09, 307.30it/s, step size=7.32e-01, acc. prob=0.917]Sample:  94%|█████████▍| 2829/3000 [00:09, 311.48it/s, step size=7.32e-01, acc. prob=0.917]Sample:  95%|█████████▌| 2863/3000 [00:09, 319.28it/s, step size=7.32e-01, acc. prob=0.917]Sample:  97%|█████████▋| 2897/3000 [00:09, 324.78it/s, step size=7.32e-01, acc. prob=0.917]Sample:  98%|█████████▊| 2930/3000 [00:09, 320.77it/s, step size=7.32e-01, acc. prob=0.917]Sample:  99%|█████████▉| 2964/3000 [00:09, 325.40it/s, step size=7.32e-01, acc. prob=0.917]Sample: 100%|██████████| 3000/3000 [00:09, 300.63it/s, step size=7.32e-01, acc. prob=0.917]
Warmup:   0%|          | 0/3000 [00:00, ?it/s]Warmup:   0%|          | 11/3000 [00:00, 103.03it/s, step size=2.90e-01, acc. prob=0.778]Warmup:   1%|          | 22/3000 [00:00, 70.84it/s, step size=3.81e-01, acc. prob=0.792] Warmup:   1%|          | 32/3000 [00:00, 67.34it/s, step size=7.04e-02, acc. prob=0.775]Warmup:   1%|▏         | 40/3000 [00:00, 70.73it/s, step size=5.62e-02, acc. prob=0.776]Warmup:   2%|▏         | 48/3000 [00:00, 57.57it/s, step size=1.56e-01, acc. prob=0.788]Warmup:   2%|▏         | 56/3000 [00:00, 60.97it/s, step size=1.32e-01, acc. prob=0.788]Warmup:   2%|▏         | 68/3000 [00:00, 70.04it/s, step size=9.68e-02, acc. prob=0.787]Warmup:   3%|▎         | 77/3000 [00:01, 71.75it/s, step size=1.73e-01, acc. prob=0.792]Warmup:   3%|▎         | 87/3000 [00:01, 77.56it/s, step size=2.06e-01, acc. prob=0.793]Warmup:   3%|▎         | 96/3000 [00:01, 78.88it/s, step size=7.66e-02, acc. prob=0.788]Warmup:   4%|▎         | 109/3000 [00:01, 88.82it/s, step size=1.93e-01, acc. prob=0.782]Warmup:   4%|▍         | 129/3000 [00:01, 117.52it/s, step size=3.76e-01, acc. prob=0.785]Warmup:   5%|▌         | 154/3000 [00:01, 153.54it/s, step size=1.07e+00, acc. prob=0.782]Warmup:   6%|▌         | 178/3000 [00:01, 176.89it/s, step size=7.13e-01, acc. prob=0.783]Warmup:   7%|▋         | 212/3000 [00:01, 222.96it/s, step size=1.19e+00, acc. prob=0.785]Warmup:   8%|▊         | 249/3000 [00:01, 264.97it/s, step size=1.85e+00, acc. prob=0.786]Warmup:   9%|▉         | 277/3000 [00:02, 263.36it/s, step size=6.87e-01, acc. prob=0.783]Warmup:  10%|█         | 312/3000 [00:02, 286.45it/s, step size=1.20e+00, acc. prob=0.785]Warmup:  12%|█▏        | 346/3000 [00:02, 301.22it/s, step size=8.00e-01, acc. prob=0.785]Warmup:  13%|█▎        | 384/3000 [00:02, 320.73it/s, step size=9.56e-01, acc. prob=0.786]Warmup:  14%|█▍        | 429/3000 [00:02, 356.38it/s, step size=8.16e-01, acc. prob=0.787]Warmup:  16%|█▌        | 465/3000 [00:02, 322.09it/s, step size=1.64e+00, acc. prob=0.787]Warmup:  17%|█▋        | 498/3000 [00:02, 318.90it/s, step size=5.94e-01, acc. prob=0.787]Warmup:  18%|█▊        | 532/3000 [00:02, 323.26it/s, step size=6.49e-01, acc. prob=0.787]Warmup:  19%|█▉        | 569/3000 [00:02, 333.14it/s, step size=9.66e-01, acc. prob=0.788]Warmup:  20%|██        | 605/3000 [00:03, 338.80it/s, step size=7.14e-01, acc. prob=0.788]Warmup:  22%|██▏       | 645/3000 [00:03, 355.30it/s, step size=1.20e+00, acc. prob=0.790]Warmup:  23%|██▎       | 686/3000 [00:03, 370.33it/s, step size=8.10e-01, acc. prob=0.790]Warmup:  24%|██▍       | 724/3000 [00:03, 368.80it/s, step size=8.38e-01, acc. prob=0.790]Warmup:  26%|██▌       | 773/3000 [00:03, 401.19it/s, step size=1.27e+00, acc. prob=0.791]Warmup:  27%|██▋       | 821/3000 [00:03, 419.13it/s, step size=4.74e-01, acc. prob=0.790]Warmup:  29%|██▉       | 865/3000 [00:03, 423.13it/s, step size=6.41e-01, acc. prob=0.791]Warmup:  30%|███       | 908/3000 [00:03, 416.50it/s, step size=1.15e+00, acc. prob=0.792]Warmup:  32%|███▏      | 950/3000 [00:03, 389.41it/s, step size=4.53e+00, acc. prob=0.792]Warmup:  33%|███▎      | 990/3000 [00:04, 355.08it/s, step size=6.43e-01, acc. prob=0.791]Sample:  34%|███▍      | 1027/3000 [00:04, 349.97it/s, step size=6.87e-01, acc. prob=0.943]Sample:  35%|███▌      | 1063/3000 [00:04, 347.71it/s, step size=6.87e-01, acc. prob=0.932]Sample:  37%|███▋      | 1099/3000 [00:04, 331.63it/s, step size=6.87e-01, acc. prob=0.927]Sample:  38%|███▊      | 1133/3000 [00:04, 322.89it/s, step size=6.87e-01, acc. prob=0.931]Sample:  39%|███▉      | 1166/3000 [00:04, 322.90it/s, step size=6.87e-01, acc. prob=0.932]Sample:  40%|███▉      | 1199/3000 [00:04, 317.65it/s, step size=6.87e-01, acc. prob=0.931]Sample:  41%|████      | 1233/3000 [00:04, 322.34it/s, step size=6.87e-01, acc. prob=0.928]Sample:  42%|████▏     | 1266/3000 [00:04, 323.36it/s, step size=6.87e-01, acc. prob=0.924]Sample:  43%|████▎     | 1299/3000 [00:04, 321.72it/s, step size=6.87e-01, acc. prob=0.924]Sample:  44%|████▍     | 1332/3000 [00:05, 316.61it/s, step size=6.87e-01, acc. prob=0.925]Sample:  45%|████▌     | 1364/3000 [00:05, 315.31it/s, step size=6.87e-01, acc. prob=0.924]Sample:  47%|████▋     | 1399/3000 [00:05, 321.43it/s, step size=6.87e-01, acc. prob=0.922]Sample:  48%|████▊     | 1432/3000 [00:05, 320.75it/s, step size=6.87e-01, acc. prob=0.923]Sample:  49%|████▉     | 1465/3000 [00:05, 322.66it/s, step size=6.87e-01, acc. prob=0.924]Sample:  50%|████▉     | 1498/3000 [00:05, 311.38it/s, step size=6.87e-01, acc. prob=0.925]Sample:  51%|█████     | 1530/3000 [00:05, 309.74it/s, step size=6.87e-01, acc. prob=0.925]Sample:  52%|█████▏    | 1562/3000 [00:05, 311.79it/s, step size=6.87e-01, acc. prob=0.925]Sample:  53%|█████▎    | 1594/3000 [00:05, 311.31it/s, step size=6.87e-01, acc. prob=0.925]Sample:  54%|█████▍    | 1626/3000 [00:06, 310.72it/s, step size=6.87e-01, acc. prob=0.926]Sample:  55%|█████▌    | 1658/3000 [00:06, 309.07it/s, step size=6.87e-01, acc. prob=0.926]Sample:  56%|█████▋    | 1694/3000 [00:06, 321.01it/s, step size=6.87e-01, acc. prob=0.927]Sample:  58%|█████▊    | 1727/3000 [00:06, 316.17it/s, step size=6.87e-01, acc. prob=0.927]Sample:  59%|█████▊    | 1760/3000 [00:06, 316.86it/s, step size=6.87e-01, acc. prob=0.927]Sample:  60%|█████▉    | 1793/3000 [00:06, 319.34it/s, step size=6.87e-01, acc. prob=0.926]Sample:  61%|██████    | 1825/3000 [00:06, 315.18it/s, step size=6.87e-01, acc. prob=0.926]Sample:  62%|██████▏   | 1857/3000 [00:06, 307.48it/s, step size=6.87e-01, acc. prob=0.926]Sample:  63%|██████▎   | 1891/3000 [00:06, 314.53it/s, step size=6.87e-01, acc. prob=0.926]Sample:  64%|██████▍   | 1923/3000 [00:06, 310.37it/s, step size=6.87e-01, acc. prob=0.926]Sample:  65%|██████▌   | 1955/3000 [00:07, 311.49it/s, step size=6.87e-01, acc. prob=0.926]Sample:  66%|██████▌   | 1987/3000 [00:07, 308.39it/s, step size=6.87e-01, acc. prob=0.926]Sample:  67%|██████▋   | 2020/3000 [00:07, 314.27it/s, step size=6.87e-01, acc. prob=0.926]Sample:  68%|██████▊   | 2052/3000 [00:07, 314.76it/s, step size=6.87e-01, acc. prob=0.926]Sample:  70%|██████▉   | 2090/3000 [00:07, 330.01it/s, step size=6.87e-01, acc. prob=0.926]Sample:  71%|███████   | 2124/3000 [00:07, 320.66it/s, step size=6.87e-01, acc. prob=0.927]Sample:  72%|███████▏  | 2158/3000 [00:07, 325.53it/s, step size=6.87e-01, acc. prob=0.926]Sample:  73%|███████▎  | 2194/3000 [00:07, 334.40it/s, step size=6.87e-01, acc. prob=0.926]Sample:  74%|███████▍  | 2228/3000 [00:07, 329.41it/s, step size=6.87e-01, acc. prob=0.926]Sample:  75%|███████▌  | 2262/3000 [00:08, 330.91it/s, step size=6.87e-01, acc. prob=0.926]Sample:  77%|███████▋  | 2296/3000 [00:08, 324.01it/s, step size=6.87e-01, acc. prob=0.926]Sample:  78%|███████▊  | 2329/3000 [00:08, 324.66it/s, step size=6.87e-01, acc. prob=0.927]Sample:  79%|███████▉  | 2363/3000 [00:08, 326.86it/s, step size=6.87e-01, acc. prob=0.927]Sample:  80%|███████▉  | 2396/3000 [00:08, 321.68it/s, step size=6.87e-01, acc. prob=0.927]Sample:  81%|████████  | 2429/3000 [00:08, 319.57it/s, step size=6.87e-01, acc. prob=0.927]Sample:  82%|████████▏ | 2461/3000 [00:08, 311.10it/s, step size=6.87e-01, acc. prob=0.928]Sample:  83%|████████▎ | 2493/3000 [00:08, 304.59it/s, step size=6.87e-01, acc. prob=0.927]Sample:  84%|████████▍ | 2524/3000 [00:08, 301.59it/s, step size=6.87e-01, acc. prob=0.928]Sample:  85%|████████▌ | 2558/3000 [00:08, 309.00it/s, step size=6.87e-01, acc. prob=0.928]Sample:  86%|████████▋ | 2591/3000 [00:09, 313.41it/s, step size=6.87e-01, acc. prob=0.928]Sample:  88%|████████▊ | 2625/3000 [00:09, 319.42it/s, step size=6.87e-01, acc. prob=0.928]Sample:  89%|████████▊ | 2657/3000 [00:09, 319.53it/s, step size=6.87e-01, acc. prob=0.928]Sample:  90%|████████▉ | 2689/3000 [00:09, 316.25it/s, step size=6.87e-01, acc. prob=0.928]Sample:  91%|█████████ | 2721/3000 [00:09, 310.09it/s, step size=6.87e-01, acc. prob=0.928]Sample:  92%|█████████▏| 2753/3000 [00:09, 308.59it/s, step size=6.87e-01, acc. prob=0.928]Sample:  93%|█████████▎| 2786/3000 [00:09, 313.29it/s, step size=6.87e-01, acc. prob=0.928]Sample:  94%|█████████▍| 2818/3000 [00:09, 311.56it/s, step size=6.87e-01, acc. prob=0.928]Sample:  95%|█████████▌| 2851/3000 [00:09, 314.39it/s, step size=6.87e-01, acc. prob=0.928]Sample:  96%|█████████▌| 2883/3000 [00:09, 315.95it/s, step size=6.87e-01, acc. prob=0.928]Sample:  97%|█████████▋| 2915/3000 [00:10, 305.99it/s, step size=6.87e-01, acc. prob=0.928]Sample:  98%|█████████▊| 2951/3000 [00:10, 320.39it/s, step size=6.87e-01, acc. prob=0.928]Sample:  99%|█████████▉| 2984/3000 [00:10, 309.82it/s, step size=6.87e-01, acc. prob=0.928]Sample: 100%|██████████| 3000/3000 [00:10, 289.51it/s, step size=6.87e-01, acc. prob=0.928]
Warmup:   0%|          | 0/3000 [00:00, ?it/s]Warmup:   0%|          | 8/3000 [00:00, 77.59it/s, step size=2.94e-02, acc. prob=0.706]Warmup:   1%|          | 16/3000 [00:00, 55.31it/s, step size=6.09e-02, acc. prob=0.767]Warmup:   1%|          | 22/3000 [00:00, 54.73it/s, step size=7.95e-02, acc. prob=0.779]Warmup:   1%|          | 28/3000 [00:00, 50.90it/s, step size=5.77e-02, acc. prob=0.778]Warmup:   1%|          | 34/3000 [00:00, 48.98it/s, step size=5.63e-02, acc. prob=0.781]Warmup:   1%|▏         | 40/3000 [00:00, 51.94it/s, step size=6.83e-02, acc. prob=0.785]Warmup:   2%|▏         | 47/3000 [00:00, 54.78it/s, step size=1.09e-01, acc. prob=0.791]Warmup:   2%|▏         | 53/3000 [00:00, 52.87it/s, step size=6.46e-02, acc. prob=0.787]Warmup:   2%|▏         | 59/3000 [00:01, 46.83it/s, step size=6.38e-02, acc. prob=0.788]Warmup:   2%|▏         | 65/3000 [00:01, 50.01it/s, step size=1.28e-01, acc. prob=0.794]Warmup:   2%|▏         | 73/3000 [00:01, 57.26it/s, step size=6.87e-02, acc. prob=0.790]Warmup:   3%|▎         | 79/3000 [00:01, 54.74it/s, step size=6.04e-02, acc. prob=0.790]Warmup:   3%|▎         | 85/3000 [00:01, 55.25it/s, step size=2.60e-01, acc. prob=0.799]Warmup:   3%|▎         | 91/3000 [00:01, 54.58it/s, step size=1.61e-01, acc. prob=0.796]Warmup:   3%|▎         | 97/3000 [00:01, 49.07it/s, step size=3.88e-02, acc. prob=0.788]Warmup:   3%|▎         | 103/3000 [00:01, 51.43it/s, step size=6.53e-02, acc. prob=0.771]Warmup:   4%|▎         | 110/3000 [00:02, 56.33it/s, step size=5.21e-01, acc. prob=0.781]Warmup:   4%|▍         | 118/3000 [00:02, 62.13it/s, step size=2.09e-01, acc. prob=0.779]Warmup:   4%|▍         | 125/3000 [00:02, 62.63it/s, step size=8.50e-02, acc. prob=0.777]Warmup:   4%|▍         | 132/3000 [00:02, 55.63it/s, step size=2.32e-01, acc. prob=0.780]Warmup:   5%|▍         | 139/3000 [00:02, 56.92it/s, step size=1.33e-01, acc. prob=0.779]Warmup:   5%|▍         | 145/3000 [00:02, 56.18it/s, step size=1.69e-01, acc. prob=0.780]Warmup:   5%|▌         | 153/3000 [00:02, 59.82it/s, step size=9.76e-02, acc. prob=0.771]Warmup:   5%|▌         | 160/3000 [00:02, 61.00it/s, step size=6.21e-02, acc. prob=0.772]Warmup:   6%|▌         | 167/3000 [00:03, 56.25it/s, step size=3.48e-01, acc. prob=0.776]Warmup:   6%|▌         | 173/3000 [00:03, 49.46it/s, step size=6.91e-02, acc. prob=0.773]Warmup:   6%|▌         | 179/3000 [00:03, 49.22it/s, step size=1.84e-01, acc. prob=0.776]Warmup:   6%|▌         | 186/3000 [00:03, 53.83it/s, step size=9.95e-02, acc. prob=0.775]Warmup:   6%|▋         | 192/3000 [00:03, 53.27it/s, step size=4.27e-02, acc. prob=0.774]Warmup:   7%|▋         | 198/3000 [00:03, 50.09it/s, step size=2.24e-01, acc. prob=0.778]Warmup:   7%|▋         | 206/3000 [00:03, 57.54it/s, step size=1.99e-01, acc. prob=0.778]Warmup:   7%|▋         | 215/3000 [00:03, 63.82it/s, step size=1.34e-01, acc. prob=0.778]Warmup:   7%|▋         | 222/3000 [00:04, 58.66it/s, step size=3.28e-01, acc. prob=0.780]Warmup:   8%|▊         | 229/3000 [00:04, 58.29it/s, step size=2.87e-01, acc. prob=0.780]Warmup:   8%|▊         | 238/3000 [00:04, 63.62it/s, step size=1.57e-01, acc. prob=0.780]Warmup:   8%|▊         | 246/3000 [00:04, 65.25it/s, step size=1.10e-01, acc. prob=0.779]Warmup:   8%|▊         | 253/3000 [00:04, 64.40it/s, step size=1.05e-01, acc. prob=0.774]Warmup:   9%|▊         | 260/3000 [00:04, 60.04it/s, step size=2.75e-01, acc. prob=0.776]Warmup:   9%|▉         | 268/3000 [00:04, 64.30it/s, step size=7.01e-02, acc. prob=0.775]Warmup:   9%|▉         | 275/3000 [00:04, 47.19it/s, step size=1.42e-01, acc. prob=0.776]Warmup:   9%|▉         | 281/3000 [00:05, 49.20it/s, step size=1.63e-01, acc. prob=0.777]Warmup:  10%|▉         | 287/3000 [00:05, 50.45it/s, step size=2.05e-01, acc. prob=0.777]Warmup:  10%|▉         | 295/3000 [00:05, 56.91it/s, step size=2.46e-01, acc. prob=0.778]Warmup:  10%|█         | 302/3000 [00:05, 55.63it/s, step size=2.09e-01, acc. prob=0.778]Warmup:  10%|█         | 314/3000 [00:05, 70.51it/s, step size=2.16e-01, acc. prob=0.778]Warmup:  11%|█         | 322/3000 [00:05, 65.58it/s, step size=1.26e-01, acc. prob=0.778]Warmup:  11%|█         | 329/3000 [00:05, 63.20it/s, step size=1.72e-01, acc. prob=0.778]Warmup:  11%|█         | 336/3000 [00:05, 58.98it/s, step size=2.62e-01, acc. prob=0.779]Warmup:  12%|█▏        | 348/3000 [00:06, 72.48it/s, step size=2.04e-01, acc. prob=0.779]Warmup:  12%|█▏        | 356/3000 [00:06, 70.59it/s, step size=2.46e-01, acc. prob=0.780]Warmup:  12%|█▏        | 364/3000 [00:06, 70.35it/s, step size=1.68e-01, acc. prob=0.779]Warmup:  12%|█▏        | 373/3000 [00:06, 74.54it/s, step size=1.14e-01, acc. prob=0.779]Warmup:  13%|█▎        | 381/3000 [00:06, 75.70it/s, step size=2.32e-01, acc. prob=0.781]Warmup:  13%|█▎        | 391/3000 [00:06, 79.30it/s, step size=2.34e-01, acc. prob=0.781]Warmup:  13%|█▎        | 400/3000 [00:06, 74.13it/s, step size=1.29e-01, acc. prob=0.780]Warmup:  14%|█▎        | 408/3000 [00:06, 70.55it/s, step size=1.70e-01, acc. prob=0.781]Warmup:  14%|█▍        | 418/3000 [00:06, 76.72it/s, step size=1.92e-01, acc. prob=0.781]Warmup:  14%|█▍        | 427/3000 [00:07, 77.10it/s, step size=1.38e-01, acc. prob=0.781]Warmup:  14%|█▍        | 435/3000 [00:07, 70.24it/s, step size=1.58e-01, acc. prob=0.781]Warmup:  15%|█▍        | 443/3000 [00:07, 71.94it/s, step size=2.94e-01, acc. prob=0.783]Warmup:  15%|█▌        | 452/3000 [00:07, 76.47it/s, step size=8.20e-02, acc. prob=0.779]Warmup:  15%|█▌        | 460/3000 [00:07, 69.57it/s, step size=2.59e-01, acc. prob=0.781]Warmup:  16%|█▌        | 468/3000 [00:07, 56.52it/s, step size=5.27e-02, acc. prob=0.780]Warmup:  16%|█▌        | 475/3000 [00:07, 54.39it/s, step size=8.97e-02, acc. prob=0.780]Warmup:  16%|█▌        | 481/3000 [00:08, 51.54it/s, step size=8.18e-02, acc. prob=0.780]Warmup:  16%|█▌        | 487/3000 [00:08, 53.34it/s, step size=1.24e-01, acc. prob=0.781]Warmup:  16%|█▋        | 493/3000 [00:08, 52.28it/s, step size=1.61e-01, acc. prob=0.781]Warmup:  17%|█▋        | 499/3000 [00:08, 45.82it/s, step size=1.22e-01, acc. prob=0.781]Warmup:  17%|█▋        | 504/3000 [00:08, 46.71it/s, step size=1.12e-01, acc. prob=0.781]Warmup:  17%|█▋        | 513/3000 [00:08, 56.95it/s, step size=1.58e-01, acc. prob=0.781]Warmup:  17%|█▋        | 522/3000 [00:08, 65.00it/s, step size=3.66e-01, acc. prob=0.782]Warmup:  18%|█▊        | 529/3000 [00:08, 65.39it/s, step size=1.98e-01, acc. prob=0.782]Warmup:  18%|█▊        | 536/3000 [00:08, 66.61it/s, step size=1.78e-01, acc. prob=0.782]Warmup:  18%|█▊        | 543/3000 [00:09, 63.85it/s, step size=2.44e-01, acc. prob=0.782]Warmup:  18%|█▊        | 550/3000 [00:09, 63.76it/s, step size=1.07e-01, acc. prob=0.782]Warmup:  19%|█▊        | 557/3000 [00:09, 60.15it/s, step size=1.53e-01, acc. prob=0.782]Warmup:  19%|█▉        | 564/3000 [00:09, 51.46it/s, step size=1.42e-01, acc. prob=0.782]Warmup:  19%|█▉        | 571/3000 [00:09, 52.12it/s, step size=9.90e-02, acc. prob=0.782]Warmup:  19%|█▉        | 580/3000 [00:09, 60.72it/s, step size=2.75e-01, acc. prob=0.783]Warmup:  20%|█▉        | 587/3000 [00:09, 58.48it/s, step size=1.27e-01, acc. prob=0.783]Warmup:  20%|█▉        | 595/3000 [00:10, 63.81it/s, step size=1.86e-01, acc. prob=0.783]Warmup:  20%|██        | 602/3000 [00:10, 54.71it/s, step size=1.06e-01, acc. prob=0.783]Warmup:  20%|██        | 608/3000 [00:10, 53.37it/s, step size=2.07e-01, acc. prob=0.784]Warmup:  20%|██        | 614/3000 [00:10, 52.17it/s, step size=1.50e-01, acc. prob=0.783]Warmup:  21%|██        | 620/3000 [00:10, 49.96it/s, step size=1.34e-01, acc. prob=0.783]Warmup:  21%|██        | 626/3000 [00:10, 52.40it/s, step size=1.10e-01, acc. prob=0.783]Warmup:  21%|██        | 635/3000 [00:10, 59.26it/s, step size=1.04e-01, acc. prob=0.783]Warmup:  21%|██▏       | 642/3000 [00:10, 54.52it/s, step size=1.45e-01, acc. prob=0.784]Warmup:  22%|██▏       | 650/3000 [00:11, 60.80it/s, step size=1.87e-01, acc. prob=0.784]Warmup:  22%|██▏       | 657/3000 [00:11, 63.11it/s, step size=9.60e-02, acc. prob=0.783]Warmup:  22%|██▏       | 664/3000 [00:11, 54.39it/s, step size=7.35e-02, acc. prob=0.783]Warmup:  22%|██▏       | 670/3000 [00:11, 50.54it/s, step size=8.05e-02, acc. prob=0.783]Warmup:  23%|██▎       | 676/3000 [00:11, 47.96it/s, step size=8.51e-02, acc. prob=0.784]Warmup:  23%|██▎       | 681/3000 [00:11, 47.18it/s, step size=1.47e-01, acc. prob=0.784]Warmup:  23%|██▎       | 688/3000 [00:11, 51.53it/s, step size=1.27e-01, acc. prob=0.784]Warmup:  23%|██▎       | 697/3000 [00:11, 59.91it/s, step size=1.31e-01, acc. prob=0.784]Warmup:  23%|██▎       | 704/3000 [00:12, 57.89it/s, step size=1.29e-01, acc. prob=0.785]Warmup:  24%|██▎       | 710/3000 [00:12, 58.32it/s, step size=1.56e-01, acc. prob=0.785]Warmup:  24%|██▍       | 717/3000 [00:12, 59.77it/s, step size=1.60e-01, acc. prob=0.785]Warmup:  24%|██▍       | 724/3000 [00:12, 53.58it/s, step size=1.17e-01, acc. prob=0.785]Warmup:  24%|██▍       | 730/3000 [00:12, 55.07it/s, step size=2.33e-01, acc. prob=0.786]Warmup:  25%|██▍       | 739/3000 [00:12, 61.01it/s, step size=1.86e-01, acc. prob=0.785]Warmup:  25%|██▍       | 747/3000 [00:12, 61.21it/s, step size=9.47e-02, acc. prob=0.785]Warmup:  25%|██▌       | 754/3000 [00:12, 54.96it/s, step size=1.15e-01, acc. prob=0.785]Warmup:  25%|██▌       | 760/3000 [00:13, 56.03it/s, step size=2.17e-01, acc. prob=0.786]Warmup:  26%|██▌       | 769/3000 [00:13, 63.23it/s, step size=1.60e-01, acc. prob=0.786]Warmup:  26%|██▌       | 777/3000 [00:13, 66.01it/s, step size=1.41e-01, acc. prob=0.786]Warmup:  26%|██▌       | 784/3000 [00:13, 61.97it/s, step size=1.48e-01, acc. prob=0.786]Warmup:  26%|██▋       | 793/3000 [00:13, 68.39it/s, step size=1.39e-01, acc. prob=0.786]Warmup:  27%|██▋       | 800/3000 [00:13, 52.81it/s, step size=8.46e-02, acc. prob=0.785]Warmup:  27%|██▋       | 806/3000 [00:13, 52.08it/s, step size=1.22e-01, acc. prob=0.786]Warmup:  27%|██▋       | 812/3000 [00:13, 52.74it/s, step size=2.19e-01, acc. prob=0.787]Warmup:  27%|██▋       | 823/3000 [00:14, 65.77it/s, step size=1.58e-01, acc. prob=0.786]Warmup:  28%|██▊       | 830/3000 [00:14, 63.58it/s, step size=1.50e-01, acc. prob=0.786]Warmup:  28%|██▊       | 837/3000 [00:14, 61.89it/s, step size=1.92e-01, acc. prob=0.787]Warmup:  28%|██▊       | 847/3000 [00:14, 71.15it/s, step size=2.17e-01, acc. prob=0.787]Warmup:  28%|██▊       | 855/3000 [00:14, 67.84it/s, step size=1.40e-01, acc. prob=0.787]Warmup:  29%|██▊       | 862/3000 [00:14, 63.25it/s, step size=1.44e-01, acc. prob=0.787]Warmup:  29%|██▉       | 869/3000 [00:14, 58.96it/s, step size=1.63e-01, acc. prob=0.787]Warmup:  29%|██▉       | 876/3000 [00:14, 58.73it/s, step size=1.50e-01, acc. prob=0.787]Warmup:  29%|██▉       | 883/3000 [00:15, 59.58it/s, step size=1.25e-01, acc. prob=0.787]Warmup:  30%|██▉       | 890/3000 [00:15, 60.63it/s, step size=2.08e-01, acc. prob=0.787]Warmup:  30%|██▉       | 898/3000 [00:15, 62.57it/s, step size=1.05e-01, acc. prob=0.787]Warmup:  30%|███       | 905/3000 [00:15, 61.32it/s, step size=1.17e-01, acc. prob=0.787]Warmup:  30%|███       | 912/3000 [00:15, 61.68it/s, step size=1.79e-01, acc. prob=0.787]Warmup:  31%|███       | 919/3000 [00:15, 63.85it/s, step size=1.60e-01, acc. prob=0.787]Warmup:  31%|███       | 926/3000 [00:15, 63.74it/s, step size=1.31e-01, acc. prob=0.787]Warmup:  31%|███       | 933/3000 [00:15, 57.03it/s, step size=1.47e-01, acc. prob=0.787]Warmup:  31%|███▏      | 944/3000 [00:15, 69.90it/s, step size=1.71e-01, acc. prob=0.788]Warmup:  32%|███▏      | 954/3000 [00:16, 73.98it/s, step size=1.11e-01, acc. prob=0.786]Warmup:  32%|███▏      | 962/3000 [00:16, 70.01it/s, step size=7.01e-02, acc. prob=0.786]Warmup:  32%|███▏      | 970/3000 [00:16, 56.18it/s, step size=1.65e-01, acc. prob=0.786]Warmup:  33%|███▎      | 978/3000 [00:16, 60.07it/s, step size=1.13e-01, acc. prob=0.786]Warmup:  33%|███▎      | 985/3000 [00:16, 55.65it/s, step size=1.40e-01, acc. prob=0.786]Warmup:  33%|███▎      | 993/3000 [00:16, 58.60it/s, step size=2.11e-01, acc. prob=0.787]Warmup:  33%|███▎      | 1001/3000 [00:16, 63.06it/s, step size=1.75e-01, acc. prob=0.970]Sample:  34%|███▎      | 1011/3000 [00:16, 71.64it/s, step size=1.75e-01, acc. prob=0.842]Sample:  34%|███▍      | 1020/3000 [00:17, 74.61it/s, step size=1.75e-01, acc. prob=0.799]Sample:  34%|███▍      | 1028/3000 [00:17, 74.24it/s, step size=1.75e-01, acc. prob=0.774]Sample:  35%|███▍      | 1037/3000 [00:17, 74.55it/s, step size=1.75e-01, acc. prob=0.776]Sample:  35%|███▍      | 1046/3000 [00:17, 76.74it/s, step size=1.75e-01, acc. prob=0.766]Sample:  35%|███▌      | 1054/3000 [00:17, 75.36it/s, step size=1.75e-01, acc. prob=0.778]Sample:  35%|███▌      | 1063/3000 [00:17, 77.33it/s, step size=1.75e-01, acc. prob=0.764]Sample:  36%|███▌      | 1074/3000 [00:17, 84.25it/s, step size=1.75e-01, acc. prob=0.776]Sample:  36%|███▌      | 1084/3000 [00:17, 83.13it/s, step size=1.75e-01, acc. prob=0.784]Sample:  36%|███▋      | 1094/3000 [00:18, 83.31it/s, step size=1.75e-01, acc. prob=0.780]Sample:  37%|███▋      | 1103/3000 [00:18, 78.68it/s, step size=1.75e-01, acc. prob=0.778]Sample:  37%|███▋      | 1111/3000 [00:18, 77.11it/s, step size=1.75e-01, acc. prob=0.784]Sample:  37%|███▋      | 1121/3000 [00:18, 81.20it/s, step size=1.75e-01, acc. prob=0.780]Sample:  38%|███▊      | 1130/3000 [00:18, 78.97it/s, step size=1.75e-01, acc. prob=0.778]Sample:  38%|███▊      | 1139/3000 [00:18, 80.97it/s, step size=1.75e-01, acc. prob=0.781]Sample:  38%|███▊      | 1149/3000 [00:18, 83.07it/s, step size=1.75e-01, acc. prob=0.784]Sample:  39%|███▊      | 1158/3000 [00:18, 80.73it/s, step size=1.75e-01, acc. prob=0.788]Sample:  39%|███▉      | 1167/3000 [00:18, 80.06it/s, step size=1.75e-01, acc. prob=0.790]Sample:  39%|███▉      | 1176/3000 [00:19, 80.48it/s, step size=1.75e-01, acc. prob=0.792]Sample:  40%|███▉      | 1185/3000 [00:19, 80.93it/s, step size=1.75e-01, acc. prob=0.791]Sample:  40%|███▉      | 1195/3000 [00:19, 85.30it/s, step size=1.75e-01, acc. prob=0.791]Sample:  40%|████      | 1204/3000 [00:19, 84.46it/s, step size=1.75e-01, acc. prob=0.793]Sample:  40%|████      | 1213/3000 [00:19, 81.38it/s, step size=1.75e-01, acc. prob=0.796]Sample:  41%|████      | 1222/3000 [00:19, 81.70it/s, step size=1.75e-01, acc. prob=0.795]Sample:  41%|████      | 1231/3000 [00:19, 77.48it/s, step size=1.75e-01, acc. prob=0.795]Sample:  41%|████▏     | 1240/3000 [00:19, 78.71it/s, step size=1.75e-01, acc. prob=0.799]Sample:  42%|████▏     | 1249/3000 [00:19, 79.66it/s, step size=1.75e-01, acc. prob=0.801]Sample:  42%|████▏     | 1260/3000 [00:20, 83.53it/s, step size=1.75e-01, acc. prob=0.803]Sample:  42%|████▏     | 1269/3000 [00:20, 79.87it/s, step size=1.75e-01, acc. prob=0.805]Sample:  43%|████▎     | 1278/3000 [00:20, 79.58it/s, step size=1.75e-01, acc. prob=0.805]Sample:  43%|████▎     | 1286/3000 [00:20, 77.67it/s, step size=1.75e-01, acc. prob=0.801]Sample:  43%|████▎     | 1296/3000 [00:20, 81.38it/s, step size=1.75e-01, acc. prob=0.806]Sample:  44%|████▎     | 1307/3000 [00:20, 87.29it/s, step size=1.75e-01, acc. prob=0.803]Sample:  44%|████▍     | 1317/3000 [00:20, 88.44it/s, step size=1.75e-01, acc. prob=0.803]Sample:  44%|████▍     | 1326/3000 [00:20, 85.47it/s, step size=1.75e-01, acc. prob=0.805]Sample:  44%|████▍     | 1335/3000 [00:20, 83.02it/s, step size=1.75e-01, acc. prob=0.807]Sample:  45%|████▍     | 1344/3000 [00:21, 80.50it/s, step size=1.75e-01, acc. prob=0.808]Sample:  45%|████▌     | 1353/3000 [00:21, 82.00it/s, step size=1.75e-01, acc. prob=0.809]Sample:  45%|████▌     | 1362/3000 [00:21, 72.00it/s, step size=1.75e-01, acc. prob=0.807]Sample:  46%|████▌     | 1370/3000 [00:21, 74.01it/s, step size=1.75e-01, acc. prob=0.807]Sample:  46%|████▌     | 1379/3000 [00:21, 76.33it/s, step size=1.75e-01, acc. prob=0.804]Sample:  46%|████▌     | 1387/3000 [00:21, 76.16it/s, step size=1.75e-01, acc. prob=0.804]Sample:  47%|████▋     | 1396/3000 [00:21, 78.01it/s, step size=1.75e-01, acc. prob=0.801]Sample:  47%|████▋     | 1406/3000 [00:21, 83.04it/s, step size=1.75e-01, acc. prob=0.801]Sample:  47%|████▋     | 1415/3000 [00:22, 80.30it/s, step size=1.75e-01, acc. prob=0.801]Sample:  47%|████▋     | 1424/3000 [00:22, 76.67it/s, step size=1.75e-01, acc. prob=0.800]Sample:  48%|████▊     | 1432/3000 [00:22, 76.48it/s, step size=1.75e-01, acc. prob=0.798]Sample:  48%|████▊     | 1440/3000 [00:22, 73.52it/s, step size=1.75e-01, acc. prob=0.797]Sample:  48%|████▊     | 1448/3000 [00:22, 73.30it/s, step size=1.75e-01, acc. prob=0.797]Sample:  49%|████▊     | 1456/3000 [00:22, 69.38it/s, step size=1.75e-01, acc. prob=0.797]Sample:  49%|████▉     | 1465/3000 [00:22, 74.02it/s, step size=1.75e-01, acc. prob=0.797]Sample:  49%|████▉     | 1474/3000 [00:22, 77.55it/s, step size=1.75e-01, acc. prob=0.797]Sample:  49%|████▉     | 1482/3000 [00:22, 77.21it/s, step size=1.75e-01, acc. prob=0.796]Sample:  50%|████▉     | 1491/3000 [00:23, 77.42it/s, step size=1.75e-01, acc. prob=0.797]Sample:  50%|█████     | 1500/3000 [00:23, 80.01it/s, step size=1.75e-01, acc. prob=0.800]Sample:  50%|█████     | 1509/3000 [00:23, 79.51it/s, step size=1.75e-01, acc. prob=0.799]Sample:  51%|█████     | 1517/3000 [00:23, 79.60it/s, step size=1.75e-01, acc. prob=0.799]Sample:  51%|█████     | 1526/3000 [00:23, 80.25it/s, step size=1.75e-01, acc. prob=0.798]Sample:  51%|█████     | 1535/3000 [00:23, 70.70it/s, step size=1.75e-01, acc. prob=0.799]Sample:  51%|█████▏    | 1543/3000 [00:23, 71.42it/s, step size=1.75e-01, acc. prob=0.800]Sample:  52%|█████▏    | 1551/3000 [00:23, 72.81it/s, step size=1.75e-01, acc. prob=0.799]Sample:  52%|█████▏    | 1559/3000 [00:23, 70.79it/s, step size=1.75e-01, acc. prob=0.798]Sample:  52%|█████▏    | 1570/3000 [00:24, 80.33it/s, step size=1.75e-01, acc. prob=0.798]Sample:  53%|█████▎    | 1579/3000 [00:24, 78.70it/s, step size=1.75e-01, acc. prob=0.798]Sample:  53%|█████▎    | 1587/3000 [00:24, 78.99it/s, step size=1.75e-01, acc. prob=0.798]Sample:  53%|█████▎    | 1595/3000 [00:24, 77.08it/s, step size=1.75e-01, acc. prob=0.797]Sample:  53%|█████▎    | 1603/3000 [00:24, 75.60it/s, step size=1.75e-01, acc. prob=0.796]Sample:  54%|█████▎    | 1612/3000 [00:24, 78.42it/s, step size=1.75e-01, acc. prob=0.796]Sample:  54%|█████▍    | 1620/3000 [00:24, 76.73it/s, step size=1.75e-01, acc. prob=0.796]Sample:  54%|█████▍    | 1630/3000 [00:24, 81.25it/s, step size=1.75e-01, acc. prob=0.795]Sample:  55%|█████▍    | 1639/3000 [00:24, 81.62it/s, step size=1.75e-01, acc. prob=0.795]Sample:  55%|█████▍    | 1648/3000 [00:25, 75.00it/s, step size=1.75e-01, acc. prob=0.795]Sample:  55%|█████▌    | 1656/3000 [00:25, 74.29it/s, step size=1.75e-01, acc. prob=0.795]Sample:  55%|█████▌    | 1664/3000 [00:25, 71.96it/s, step size=1.75e-01, acc. prob=0.795]Sample:  56%|█████▌    | 1674/3000 [00:25, 78.49it/s, step size=1.75e-01, acc. prob=0.794]Sample:  56%|█████▌    | 1683/3000 [00:25, 80.47it/s, step size=1.75e-01, acc. prob=0.795]Sample:  56%|█████▋    | 1693/3000 [00:25, 83.71it/s, step size=1.75e-01, acc. prob=0.793]Sample:  57%|█████▋    | 1703/3000 [00:25, 86.05it/s, step size=1.75e-01, acc. prob=0.795]Sample:  57%|█████▋    | 1713/3000 [00:25, 88.65it/s, step size=1.75e-01, acc. prob=0.795]Sample:  57%|█████▋    | 1722/3000 [00:25, 83.88it/s, step size=1.75e-01, acc. prob=0.796]Sample:  58%|█████▊    | 1731/3000 [00:26, 80.69it/s, step size=1.75e-01, acc. prob=0.795]Sample:  58%|█████▊    | 1740/3000 [00:26, 80.03it/s, step size=1.75e-01, acc. prob=0.795]Sample:  58%|█████▊    | 1749/3000 [00:26, 80.40it/s, step size=1.75e-01, acc. prob=0.795]Sample:  59%|█████▊    | 1758/3000 [00:26, 78.76it/s, step size=1.75e-01, acc. prob=0.796]Sample:  59%|█████▉    | 1766/3000 [00:26, 76.94it/s, step size=1.75e-01, acc. prob=0.796]Sample:  59%|█████▉    | 1775/3000 [00:26, 76.27it/s, step size=1.75e-01, acc. prob=0.795]Sample:  59%|█████▉    | 1783/3000 [00:26, 73.20it/s, step size=1.75e-01, acc. prob=0.795]Sample:  60%|█████▉    | 1791/3000 [00:26, 73.06it/s, step size=1.75e-01, acc. prob=0.794]Sample:  60%|██████    | 1800/3000 [00:27, 75.83it/s, step size=1.75e-01, acc. prob=0.794]Sample:  60%|██████    | 1809/3000 [00:27, 78.59it/s, step size=1.75e-01, acc. prob=0.794]Sample:  61%|██████    | 1817/3000 [00:27, 74.77it/s, step size=1.75e-01, acc. prob=0.794]Sample:  61%|██████    | 1826/3000 [00:27, 77.06it/s, step size=1.75e-01, acc. prob=0.794]Sample:  61%|██████    | 1834/3000 [00:27, 77.01it/s, step size=1.75e-01, acc. prob=0.794]Sample:  62%|██████▏   | 1845/3000 [00:27, 82.27it/s, step size=1.75e-01, acc. prob=0.793]Sample:  62%|██████▏   | 1854/3000 [00:27, 80.15it/s, step size=1.75e-01, acc. prob=0.794]Sample:  62%|██████▏   | 1865/3000 [00:27, 87.33it/s, step size=1.75e-01, acc. prob=0.793]Sample:  62%|██████▏   | 1874/3000 [00:27, 85.70it/s, step size=1.75e-01, acc. prob=0.792]Sample:  63%|██████▎   | 1883/3000 [00:28, 77.91it/s, step size=1.75e-01, acc. prob=0.791]Sample:  63%|██████▎   | 1891/3000 [00:28, 78.32it/s, step size=1.75e-01, acc. prob=0.790]Sample:  63%|██████▎   | 1899/3000 [00:28, 76.69it/s, step size=1.75e-01, acc. prob=0.790]Sample:  64%|██████▎   | 1907/3000 [00:28, 74.47it/s, step size=1.75e-01, acc. prob=0.791]Sample:  64%|██████▍   | 1916/3000 [00:28, 74.63it/s, step size=1.75e-01, acc. prob=0.791]Sample:  64%|██████▍   | 1926/3000 [00:28, 80.45it/s, step size=1.75e-01, acc. prob=0.791]Sample:  64%|██████▍   | 1935/3000 [00:28, 76.66it/s, step size=1.75e-01, acc. prob=0.791]Sample:  65%|██████▍   | 1944/3000 [00:28, 78.27it/s, step size=1.75e-01, acc. prob=0.792]Sample:  65%|██████▌   | 1952/3000 [00:28, 76.60it/s, step size=1.75e-01, acc. prob=0.791]Sample:  65%|██████▌   | 1963/3000 [00:29, 84.48it/s, step size=1.75e-01, acc. prob=0.791]Sample:  66%|██████▌   | 1972/3000 [00:29, 82.46it/s, step size=1.75e-01, acc. prob=0.792]Sample:  66%|██████▌   | 1981/3000 [00:29, 78.10it/s, step size=1.75e-01, acc. prob=0.791]Sample:  66%|██████▋   | 1989/3000 [00:29, 76.52it/s, step size=1.75e-01, acc. prob=0.791]Sample:  67%|██████▋   | 1999/3000 [00:29, 81.76it/s, step size=1.75e-01, acc. prob=0.791]Sample:  67%|██████▋   | 2008/3000 [00:29, 79.39it/s, step size=1.75e-01, acc. prob=0.791]Sample:  67%|██████▋   | 2017/3000 [00:29, 76.02it/s, step size=1.75e-01, acc. prob=0.791]Sample:  68%|██████▊   | 2025/3000 [00:29, 75.95it/s, step size=1.75e-01, acc. prob=0.791]Sample:  68%|██████▊   | 2033/3000 [00:29, 72.78it/s, step size=1.75e-01, acc. prob=0.790]Sample:  68%|██████▊   | 2042/3000 [00:30, 77.36it/s, step size=1.75e-01, acc. prob=0.791]Sample:  68%|██████▊   | 2050/3000 [00:30, 75.81it/s, step size=1.75e-01, acc. prob=0.790]Sample:  69%|██████▊   | 2058/3000 [00:30, 76.82it/s, step size=1.75e-01, acc. prob=0.791]Sample:  69%|██████▉   | 2066/3000 [00:30, 75.57it/s, step size=1.75e-01, acc. prob=0.790]Sample:  69%|██████▉   | 2074/3000 [00:30, 70.33it/s, step size=1.75e-01, acc. prob=0.790]Sample:  70%|██████▉   | 2085/3000 [00:30, 79.09it/s, step size=1.75e-01, acc. prob=0.790]Sample:  70%|██████▉   | 2095/3000 [00:30, 83.94it/s, step size=1.75e-01, acc. prob=0.790]Sample:  70%|███████   | 2104/3000 [00:30, 83.16it/s, step size=1.75e-01, acc. prob=0.790]Sample:  70%|███████   | 2113/3000 [00:30, 82.83it/s, step size=1.75e-01, acc. prob=0.790]Sample:  71%|███████   | 2122/3000 [00:31, 84.55it/s, step size=1.75e-01, acc. prob=0.790]Sample:  71%|███████   | 2131/3000 [00:31, 81.51it/s, step size=1.75e-01, acc. prob=0.791]Sample:  71%|███████▏  | 2140/3000 [00:31, 81.67it/s, step size=1.75e-01, acc. prob=0.790]Sample:  72%|███████▏  | 2149/3000 [00:31, 81.78it/s, step size=1.75e-01, acc. prob=0.791]Sample:  72%|███████▏  | 2158/3000 [00:31, 75.22it/s, step size=1.75e-01, acc. prob=0.791]Sample:  72%|███████▏  | 2169/3000 [00:31, 82.37it/s, step size=1.75e-01, acc. prob=0.791]Sample:  73%|███████▎  | 2178/3000 [00:31, 77.11it/s, step size=1.75e-01, acc. prob=0.791]Sample:  73%|███████▎  | 2188/3000 [00:31, 82.10it/s, step size=1.75e-01, acc. prob=0.791]Sample:  73%|███████▎  | 2197/3000 [00:32, 83.67it/s, step size=1.75e-01, acc. prob=0.791]Sample:  74%|███████▎  | 2206/3000 [00:32, 76.64it/s, step size=1.75e-01, acc. prob=0.792]Sample:  74%|███████▍  | 2218/3000 [00:32, 84.64it/s, step size=1.75e-01, acc. prob=0.792]Sample:  74%|███████▍  | 2228/3000 [00:32, 82.86it/s, step size=1.75e-01, acc. prob=0.792]Sample:  75%|███████▍  | 2237/3000 [00:32, 74.48it/s, step size=1.75e-01, acc. prob=0.791]Sample:  75%|███████▍  | 2246/3000 [00:32, 77.29it/s, step size=1.75e-01, acc. prob=0.791]Sample:  75%|███████▌  | 2254/3000 [00:32, 72.14it/s, step size=1.75e-01, acc. prob=0.791]Sample:  75%|███████▌  | 2264/3000 [00:32, 77.38it/s, step size=1.75e-01, acc. prob=0.792]Sample:  76%|███████▌  | 2272/3000 [00:32, 77.98it/s, step size=1.75e-01, acc. prob=0.791]Sample:  76%|███████▌  | 2280/3000 [00:33, 78.33it/s, step size=1.75e-01, acc. prob=0.791]Sample:  76%|███████▋  | 2289/3000 [00:33, 79.30it/s, step size=1.75e-01, acc. prob=0.791]Sample:  77%|███████▋  | 2298/3000 [00:33, 82.11it/s, step size=1.75e-01, acc. prob=0.790]Sample:  77%|███████▋  | 2307/3000 [00:33, 81.75it/s, step size=1.75e-01, acc. prob=0.791]Sample:  77%|███████▋  | 2316/3000 [00:33, 80.08it/s, step size=1.75e-01, acc. prob=0.791]Sample:  78%|███████▊  | 2325/3000 [00:33, 76.38it/s, step size=1.75e-01, acc. prob=0.791]Sample:  78%|███████▊  | 2333/3000 [00:33, 73.17it/s, step size=1.75e-01, acc. prob=0.791]Sample:  78%|███████▊  | 2341/3000 [00:33, 72.77it/s, step size=1.75e-01, acc. prob=0.791]Sample:  78%|███████▊  | 2350/3000 [00:33, 76.29it/s, step size=1.75e-01, acc. prob=0.791]Sample:  79%|███████▊  | 2361/3000 [00:34, 83.39it/s, step size=1.75e-01, acc. prob=0.791]Sample:  79%|███████▉  | 2370/3000 [00:34, 82.81it/s, step size=1.75e-01, acc. prob=0.791]Sample:  79%|███████▉  | 2379/3000 [00:34, 81.36it/s, step size=1.75e-01, acc. prob=0.792]Sample:  80%|███████▉  | 2388/3000 [00:34, 81.59it/s, step size=1.75e-01, acc. prob=0.792]Sample:  80%|███████▉  | 2397/3000 [00:34, 77.14it/s, step size=1.75e-01, acc. prob=0.792]Sample:  80%|████████  | 2405/3000 [00:34, 77.85it/s, step size=1.75e-01, acc. prob=0.792]Sample:  80%|████████  | 2413/3000 [00:34, 74.18it/s, step size=1.75e-01, acc. prob=0.792]Sample:  81%|████████  | 2421/3000 [00:34, 69.89it/s, step size=1.75e-01, acc. prob=0.792]Sample:  81%|████████  | 2432/3000 [00:35, 78.49it/s, step size=1.75e-01, acc. prob=0.793]Sample:  81%|████████▏ | 2440/3000 [00:35, 77.79it/s, step size=1.75e-01, acc. prob=0.792]Sample:  82%|████████▏ | 2451/3000 [00:35, 82.28it/s, step size=1.75e-01, acc. prob=0.792]Sample:  82%|████████▏ | 2460/3000 [00:35, 82.20it/s, step size=1.75e-01, acc. prob=0.792]Sample:  82%|████████▏ | 2469/3000 [00:35, 78.75it/s, step size=1.75e-01, acc. prob=0.792]Sample:  83%|████████▎ | 2477/3000 [00:35, 78.94it/s, step size=1.75e-01, acc. prob=0.791]Sample:  83%|████████▎ | 2485/3000 [00:35, 71.08it/s, step size=1.75e-01, acc. prob=0.791]Sample:  83%|████████▎ | 2494/3000 [00:35, 74.18it/s, step size=1.75e-01, acc. prob=0.791]Sample:  83%|████████▎ | 2502/3000 [00:35, 75.67it/s, step size=1.75e-01, acc. prob=0.791]Sample:  84%|████████▎ | 2510/3000 [00:36, 74.35it/s, step size=1.75e-01, acc. prob=0.792]Sample:  84%|████████▍ | 2518/3000 [00:36, 72.75it/s, step size=1.75e-01, acc. prob=0.791]Sample:  84%|████████▍ | 2526/3000 [00:36, 70.65it/s, step size=1.75e-01, acc. prob=0.791]Sample:  85%|████████▍ | 2536/3000 [00:36, 77.70it/s, step size=1.75e-01, acc. prob=0.791]Sample:  85%|████████▍ | 2545/3000 [00:36, 80.95it/s, step size=1.75e-01, acc. prob=0.792]Sample:  85%|████████▌ | 2554/3000 [00:36, 83.50it/s, step size=1.75e-01, acc. prob=0.792]Sample:  85%|████████▌ | 2563/3000 [00:36, 83.03it/s, step size=1.75e-01, acc. prob=0.791]Sample:  86%|████████▌ | 2572/3000 [00:36, 80.42it/s, step size=1.75e-01, acc. prob=0.792]Sample:  86%|████████▌ | 2581/3000 [00:36, 74.71it/s, step size=1.75e-01, acc. prob=0.792]Sample:  86%|████████▋ | 2589/3000 [00:37, 75.03it/s, step size=1.75e-01, acc. prob=0.792]Sample:  87%|████████▋ | 2598/3000 [00:37, 76.96it/s, step size=1.75e-01, acc. prob=0.792]Sample:  87%|████████▋ | 2609/3000 [00:37, 81.49it/s, step size=1.75e-01, acc. prob=0.792]Sample:  87%|████████▋ | 2618/3000 [00:37, 79.24it/s, step size=1.75e-01, acc. prob=0.791]Sample:  88%|████████▊ | 2626/3000 [00:37, 79.17it/s, step size=1.75e-01, acc. prob=0.791]Sample:  88%|████████▊ | 2636/3000 [00:37, 81.73it/s, step size=1.75e-01, acc. prob=0.792]Sample:  88%|████████▊ | 2645/3000 [00:37, 75.42it/s, step size=1.75e-01, acc. prob=0.791]Sample:  88%|████████▊ | 2653/3000 [00:37, 74.65it/s, step size=1.75e-01, acc. prob=0.792]Sample:  89%|████████▊ | 2662/3000 [00:37, 77.86it/s, step size=1.75e-01, acc. prob=0.791]Sample:  89%|████████▉ | 2671/3000 [00:38, 81.12it/s, step size=1.75e-01, acc. prob=0.791]Sample:  89%|████████▉ | 2680/3000 [00:38, 79.13it/s, step size=1.75e-01, acc. prob=0.791]Sample:  90%|████████▉ | 2688/3000 [00:38, 78.07it/s, step size=1.75e-01, acc. prob=0.792]Sample:  90%|████████▉ | 2696/3000 [00:38, 74.21it/s, step size=1.75e-01, acc. prob=0.791]Sample:  90%|█████████ | 2705/3000 [00:38, 75.22it/s, step size=1.75e-01, acc. prob=0.791]Sample:  90%|█████████ | 2714/3000 [00:38, 77.04it/s, step size=1.75e-01, acc. prob=0.791]Sample:  91%|█████████ | 2724/3000 [00:38, 79.09it/s, step size=1.75e-01, acc. prob=0.791]Sample:  91%|█████████ | 2733/3000 [00:38, 82.02it/s, step size=1.75e-01, acc. prob=0.790]Sample:  91%|█████████▏| 2742/3000 [00:39, 79.57it/s, step size=1.75e-01, acc. prob=0.790]Sample:  92%|█████████▏| 2750/3000 [00:39, 77.38it/s, step size=1.75e-01, acc. prob=0.790]Sample:  92%|█████████▏| 2758/3000 [00:39, 75.75it/s, step size=1.75e-01, acc. prob=0.790]Sample:  92%|█████████▏| 2766/3000 [00:39, 72.66it/s, step size=1.75e-01, acc. prob=0.790]Sample:  92%|█████████▏| 2774/3000 [00:39, 70.62it/s, step size=1.75e-01, acc. prob=0.791]Sample:  93%|█████████▎| 2783/3000 [00:39, 74.69it/s, step size=1.75e-01, acc. prob=0.791]Sample:  93%|█████████▎| 2791/3000 [00:39, 76.06it/s, step size=1.75e-01, acc. prob=0.791]Sample:  93%|█████████▎| 2799/3000 [00:39, 77.17it/s, step size=1.75e-01, acc. prob=0.791]Sample:  94%|█████████▎| 2807/3000 [00:39, 71.55it/s, step size=1.75e-01, acc. prob=0.791]Sample:  94%|█████████▍| 2816/3000 [00:40, 76.55it/s, step size=1.75e-01, acc. prob=0.791]Sample:  94%|█████████▍| 2824/3000 [00:40, 74.98it/s, step size=1.75e-01, acc. prob=0.790]Sample:  94%|█████████▍| 2833/3000 [00:40, 79.14it/s, step size=1.75e-01, acc. prob=0.790]Sample:  95%|█████████▍| 2843/3000 [00:40, 81.63it/s, step size=1.75e-01, acc. prob=0.790]Sample:  95%|█████████▌| 2852/3000 [00:40, 73.17it/s, step size=1.75e-01, acc. prob=0.791]Sample:  95%|█████████▌| 2860/3000 [00:40, 69.08it/s, step size=1.75e-01, acc. prob=0.791]Sample:  96%|█████████▌| 2868/3000 [00:40, 68.85it/s, step size=1.75e-01, acc. prob=0.791]Sample:  96%|█████████▌| 2878/3000 [00:40, 75.08it/s, step size=1.75e-01, acc. prob=0.791]Sample:  96%|█████████▌| 2887/3000 [00:40, 76.84it/s, step size=1.75e-01, acc. prob=0.791]Sample:  96%|█████████▋| 2895/3000 [00:41, 77.51it/s, step size=1.75e-01, acc. prob=0.791]Sample:  97%|█████████▋| 2904/3000 [00:41, 79.81it/s, step size=1.75e-01, acc. prob=0.792]Sample:  97%|█████████▋| 2914/3000 [00:41, 83.34it/s, step size=1.75e-01, acc. prob=0.792]Sample:  97%|█████████▋| 2923/3000 [00:41, 82.90it/s, step size=1.75e-01, acc. prob=0.792]Sample:  98%|█████████▊| 2932/3000 [00:41, 82.31it/s, step size=1.75e-01, acc. prob=0.792]Sample:  98%|█████████▊| 2941/3000 [00:41, 84.25it/s, step size=1.75e-01, acc. prob=0.792]Sample:  98%|█████████▊| 2951/3000 [00:41, 87.82it/s, step size=1.75e-01, acc. prob=0.792]Sample:  99%|█████████▊| 2960/3000 [00:41, 85.89it/s, step size=1.75e-01, acc. prob=0.792]Sample:  99%|█████████▉| 2969/3000 [00:41, 84.69it/s, step size=1.75e-01, acc. prob=0.792]Sample:  99%|█████████▉| 2978/3000 [00:42, 78.99it/s, step size=1.75e-01, acc. prob=0.792]Sample: 100%|█████████▉| 2987/3000 [00:42, 79.86it/s, step size=1.75e-01, acc. prob=0.792]Sample: 100%|█████████▉| 2996/3000 [00:42, 72.30it/s, step size=1.75e-01, acc. prob=0.792]Sample: 100%|██████████| 3000/3000 [00:42, 70.79it/s, step size=1.75e-01, acc. prob=0.792]

After running the analysis, we can extract the posterior distributions for our beta parameter in each model. Let’s see what they tell us.

# Get posterior samples and print results
clip_beta_mean = clip_samples['beta'].mean().item()
clip_beta_hdi = torch.quantile(clip_samples['beta'], torch.tensor([0.025, 0.975]))

print(f"\nCLIP Similarity - Bayesian Regression:")
print(f"  Beta (VerbType effect): {clip_beta_mean:.3f}")
print(f"  95% HDI: [{clip_beta_hdi[0]:.3f}, {clip_beta_hdi[1]:.3f}]")
print(f"  P(beta < 0): {(clip_samples['beta'] < 0).float().mean():.3f}")

subject_beta_mean = subject_samples['beta'].mean().item()
subject_beta_hdi = torch.quantile(subject_samples['beta'], torch.tensor([0.025, 0.975]))

print(f"\nSubject Salience - Bayesian Regression:")
print(f"  Beta (VerbType effect): {subject_beta_mean:.3f}")
print(f"  95% HDI: [{subject_beta_hdi[0]:.3f}, {subject_beta_hdi[1]:.3f}]") 
print(f"  P(beta < 0): {(subject_samples['beta'] < 0).float().mean():.3f}")

vlm_beta_mean = vlm_samples['beta'].mean().item()
vlm_beta_hdi = torch.quantile(vlm_samples['beta'], torch.tensor([0.025, 0.975]))

print(f"\nVLM Score - Ordered Logistic Regression:")
print(f"  Beta (VerbType effect): {vlm_beta_mean:.3f}")
print(f"  95% HDI: [{vlm_beta_hdi[0]:.3f}, {vlm_beta_hdi[1]:.3f}]")
print(f"  P(beta < 0): {(vlm_samples['beta'] < 0).float().mean():.3f}")

CLIP Similarity - Bayesian Regression:
  Beta (VerbType effect): -2.245
  95% HDI: [-5.305, 0.815]
  P(beta < 0): 0.928

Subject Salience - Bayesian Regression:
  Beta (VerbType effect): -1.386
  95% HDI: [-4.651, 1.821]
  P(beta < 0): 0.799

VLM Score - Ordered Logistic Regression:
  Beta (VerbType effect): -2.182
  95% HDI: [-3.942, -0.415]
  P(beta < 0): 0.994

How to Read This Plot

We used a Contrast Coding system for our analysis: Unaccusatives were assigned +0.5 and Unergatives were assigned −0.5. Because of this math, our “Beta” (β) represents the difference: Unaccusative minus Unergative.

  1. The Zero Line (The “Null”)

The vertical gray line at 0 represents “no difference”. If a model’s “cigar” is centered here, it means the model treats both picture types exactly the same.

  1. The Left Side (Negative β)

If the distribution is on the left, the score for Unaccusatives was lower than Unergatives.

The Finding: This is the “Danger Zone” for our stimuli. It means Unaccusative pictures are harder for the models to understand or identify.

Our Result: Both Full Scene (CLIP) and Scene Verification (VLM) are shifted heavily to the left. This tells us that, visually speaking, the unaccusative pictures are significantly less clear or representative than the unergative ones.

  1. The Right Side (Positive β)

If the distribution were on the right, it would mean Unaccusatives scored higher.

The Finding: This would suggest Unaccusative pictures are actually “better” or “easier” than Unergative ones.

Our Result: None of our models show this

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Data dictionary from your MCMC samples
beta_data = {
    'Full Scene (CLIP)': clip_samples['beta'].numpy(),
    'Subject Salience (CLIP)': subject_samples['beta'].numpy(),
    'Scene Verification (VLM)': vlm_samples['beta'].numpy()
}

# Adjust figure size for better vertical separation
fig, ax = plt.subplots(figsize=(10, 7))
sns.set_style("whitegrid", {'axes.grid': True, 'grid.color': '.95'})

labels = list(beta_data.keys())
colors = ['#3498db', '#9b59b6', '#e74c3c']

for i, label in enumerate(labels):
    samples = beta_data[label]
    mean_val = samples.mean()
    
    # 1. Calculate multiple intervals for the "stacking" effect
    hdi_95 = np.percentile(samples, [2.5, 97.5])
    hdi_80 = np.percentile(samples, [10, 90])
    hdi_50 = np.percentile(samples, [25, 75])
    
    # 2. Plot the stacked lines (Bottom to Top: thinnest/widest first)
    # 95% Interval - Thin
    ax.hlines(i, hdi_95[0], hdi_95[1], color=colors[i], linewidth=1.5, alpha=0.4, zorder=1)
    # 80% Interval - Medium
    ax.hlines(i, hdi_80[0], hdi_80[1], color=colors[i], linewidth=5.0, alpha=0.7, zorder=2)
    # 50% Interval - Thick
    ax.hlines(i, hdi_50[0], hdi_50[1], color=colors[i], linewidth=10.0, alpha=1.0, zorder=3)
    
    # 3. Plot the Mean point
    ax.plot(mean_val, i, 'o', color='white', markersize=8, zorder=4)
    
    # 4. Perfectly Aligned Statistics
    p_dir = (samples < 0).mean() if mean_val < 0 else (samples > 0).mean()
    prob_text = f"$P(\\beta {'<' if mean_val < 0 else '>' } 0) = {p_dir:.2f}$"
    
    # Locked to y-coordinate 'i' and x-coordinate 3.0 (outside plot area)
    ax.text(3.0, i, prob_text, va='center', ha='left', 
            fontsize=13, fontweight='bold', color=colors[i])

# 5. Descriptive Annotations (The "How to Read" Guide)
ax.axvline(x=0, color='black', linestyle='-', linewidth=1.5, alpha=0.6, zorder=0)

# Arrow pointing Left (Negative Beta)
ax.annotate('', xy=(-5, -1.0), xytext=(-0.5, -1.0),
            arrowprops=dict(arrowstyle="->", color='gray', lw=1.5))
ax.text(-2.75, -1.4, "Lower Scores for\nUnaccusatives", ha='center', color='gray', fontweight='bold')

# Arrow pointing Right (Positive Beta)
ax.annotate('', xy=(2.5, -1.0), xytext=(0.5, -1.0),
            arrowprops=dict(arrowstyle="->", color='gray', lw=1.5))
ax.text(1.5, -1.4, "Lower Scores for\nUnergatives", ha='center', color='gray', fontweight='bold')

# 6. Final Layout Polish
ax.set_yticks(range(len(labels)))
ax.set_yticklabels(labels, fontweight='bold', fontsize=12)
ax.set_xlabel('Posterior Beta Weight (Unaccusative vs. Unergative)', fontsize=13, labelpad=45)

# Lock limits so text and arrows don't shift
ax.set_xlim(-6, 3)
ax.set_ylim(-1.5, len(labels) - 0.5)

sns.despine(left=True, bottom=True)
plt.subplots_adjust(right=0.75, bottom=0.2) # Make room for text on right and guide on bottom
plt.savefig('./model_pyro.png', dpi=300, bbox_inches='tight')
plt.show()

  • lets think about the results again. Yeah maybe 94% does not include but there is a more than a moderate change that accusatives are harder to process and also subject salience is decreased in unaccusatives.

  • Metric,Posterior β (Effect),Direction & Certainty Scene Verification (VLM),~ -2.3,Strong Negative Effect: The VLM consistently rates Unaccusative scenes lower. P(β<0)=0.99 indicates very high certainty. Full Scene (CLIP),~ -2.3,“Strong Negative Effect: Similar to the VLM, CLIP shows lower similarity for Unaccusative scenes. P(β<0)=0.92 is quite robust.” Subject Salience (CLIP),~ -1.5,“Moderate Negative Effect: The subject is slightly harder to identify in Unaccusative scenes, but the evidence is weaker (P=0.83) and the interval is much wider (more uncertainty).”

  • Even before we look at human data, the AI models are telling us: “These pictures aren’t equal.” The unaccusative scenes have a lower “visual-textual fit,” which means we must be careful not to mistake this perceptual “clutter” for a purely linguistic planning effect.

  • one thing this predict is that, the effects we are seeing can be partially due to hardship of the pictures.

  • However, given the general picture present in sentence production literature, this seems unlikely. Sauppe’s group found bunch of advance planning cases, where participant were faster to start speaking when they do not need to plan ahead for the verbal elements.

  • Similarly Momma and Yoshida shown advance planning in sentence-reall experiments. they shown that people were slower to start speaking when they say sentences such as ‘Which computer did you buy and repair?’ when there was a related verb to repairing.

  • important this only happened with ‘ATB’ type of sentences, and not parasitic gap sentences such as ‘Which computer did you repair after buying?’.

Conclusion

The Finding

The analysis reveals a consistent pattern across all three metrics: unaccusative scenes are rated as more difficult or less representative by the models compared to unergative scenes.

  1. Scene Verification (VLM): The Qwen-VL model, which was asked to explicitly rate the match between the sentence and the image, showed a strong negative effect for unaccusatives. It consistently gave lower scores to unaccusative pairs, with a high degree of certainty (P(β<0) = 0.99). This suggests that from a generative, “common sense” perspective, the unaccusative sentences are poorer descriptions of their corresponding images.

  2. Full Scene Similarity (CLIP): The standard CLIP similarity score also revealed a strong negative effect for unaccusatives (P(β<0) = 0.92). This indicates that the overall visual-textual fit is lower for unaccusative scenes.

  3. Subject Salience (CLIP): Even the salience of the subject noun was moderately lower in unaccusative scenes (P(β<0) = 0.83). While the evidence is weaker here, it suggests that the subject may be slightly harder to identify in the context of an unaccusative event.

In short, the models are telling us that the unaccusative pictures are not as clear-cut as the unergative ones.

What This Means

This computational analysis provides a crucial piece of context for the human experimental results. The key takeaway is that the unaccusative stimuli seem to be inherently more complex or ambiguous than the unergative stimuli.

This doesn’t invalidate the syntactic hypothesis about advance planning, but it does add a layer of nuance. The increased processing cost observed in human speakers for unaccusative sentences might not be solely due to a syntactic operation. Instead, it could be a combination of factors:

  • Perceptual/Conceptual Difficulty: The visual scenes for unaccusative events might be harder to parse, conceptualize, and map onto a linguistic description. The AI models, particularly the VLM, seem to be picking up on this.
  • Syntactic Planning: The syntactic structure of unaccusatives may still require earlier planning, as originally hypothesized.

The most likely scenario is that these two factors are intertwined. The very nature of unaccusative events (a change of state happening to a patient) makes them visually more complex, and this complexity might be what triggers the earlier, more resource-intensive syntactic planning.

We can be more confident that the experimental effects are not just due to simple visual confounds like a hidden subject, but we must also acknowledge that the “difficulty” is not purely syntactic. It’s a property of the entire event, from perception to syntax.

Broader Implications

I think this kind of analysis represents something really exciting about modern psycholinguistics. We’re not just running experiments and hoping for the best—we’re using computational tools to validate our materials in ways that weren’t possible even a few years ago.

Vision-language models like CLIP and multimodal LLMs like QWEN give us principled ways to ask: “Are these pictures doing what we think they’re doing?” The fact that we can now triangulate across different model architectures—similarity-based vs. generative—makes the validation even stronger. CLIP provides fast, quantitative similarity scores, while multimodal LLMs can provide more nuanced, human-interpretable ratings.

The convergence of evidence from both CLIP and multimodal LLMs provides a robust validation framework for experimental materials.

If you want to run this analysis yourself, you can use the following Colab notebook:

Open In Colab

Final Thoughts

This analysis didn’t change my theoretical interpretation of the experimental findings—but it made me much more confident in them. And that’s exactly what good methodological work should do.

If you’re running experiments with visual stimuli, I highly recommend giving this kind of analysis a try. Both CLIP and multimodal LLMs like Qwen2-VL are freely available, relatively easy to use, and can give you valuable insights into whether your materials are doing what you think they’re doing. The fact that you can now validate your stimuli using multiple computational approaches—from simple similarity scoring to sophisticated multimodal reasoning—provides unprecedented confidence in your experimental materials.

Plus, it’s just fun to see what these models “think” about your carefully crafted experimental stimuli. Sometimes they agree with each other and with you. Sometimes they surprise you. Either way, you learn something.


References

Momma, S., & Ferreira, V. (2019). Beyond linear order: The role of argument structure in speaking. Cognitive Psychology, 114, 101228.

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. International Conference on Machine Learning (pp. 8748-8763). PMLR.

Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., … & Zhou, J. (2023). Qwen-VL: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966.


Session Info

For reproducibility, here’s my setup:

import sys
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CLIP: (installed from https://github.com/openai/CLIP)")
print(f"Transformers: (for Qwen-VL-Chat)")

Footnotes

  1. However, an interesting sidenote is that we do not really know if human cognition is also propositional.↩︎

  2. It works very slowly because they are extremely resource hungry. The reason this post waited this much was because I was waiting for results to come in.↩︎

  3. There are of course other ways to test this. For example Griffin & Bock (2000) used a free-production task where participants were not given an initial word to use with the pictures. They quantified how many different words they used for each picture and named that variable ‘codability’ of the picture and tested if codability was related to onset latency. Egurtzegi et al. (2022) used a similar approach.↩︎